10,000 Matching Annotations
  1. Jun 2025
    1. z1is the x-coordinate, and z2the y-coordinate of point A in the red system

      z1 and z2 represent coordinates for point A in red system * We see that z1 then should have a negative sign even tho it is pointed to the right * because alpha 1 is pointed to the left

    1. Document de Synthèse : L'Emprise et ses Implications Médico-Légales

      Ce briefing synthétise les points clés de la conférence intitulée "Conférence CRIAVS - Emprise", abordant la nature complexe de la relation d'emprise, son exploration psychiatrique et ses implications juridiques.

      L'intervenant, un psychiatre, met en lumière les désaccords avec les approches précédentes, soulignant l'importance d'une compréhension holistique du lien auteur-victime.

      1. Préambule : Distinction et Compréhension

      L'orateur introduit son propos en soulignant une divergence d'approche avec la loi actuelle. Tandis que le droit distingue clairement les victimes des auteurs, la psychiatrie s'intéresse à la dynamique du lien entre les deux.

      • L'approche juridique vs. psychiatrique : "d'un côté il y a la loi qui aujourd'hui distingue de façon très claire les victimes d'un côté les auteurs de l'autre... moi je me place du côté de la psychiatrie et du côté de la psychiatrie il y a un intérêt à aller chercher à renseigner ce qui se passe dans le lien entre une victime et un auteur."

      • Objectif de la compréhension : Comprendre les mécanismes ne signifie ni excuser l'auteur, ni blâmer la victime, mais "identifier mieux" et "juger mieux" pour des raisons médico-légales et thérapeutiques.

      2. La Complexité du Lien Auteur-Victime

      Le lien entre l'auteur et la victime est intrinsèquement complexe, pouvant même présenter une "zone de chevauchement" où les rôles peuvent s'inverser.

      • Zone grise et inversion des rôles : "il peut exister entre la victime et l'auteur une zone de chevauchement une zone grise... il y a des mécanismes un peu de renversement c'est-à-dire que la victime elle devient un peu hauteur et l'auteur il devient un peu victime."

      • Le même individu peut être les deux : "la victime et l'auteur peuvent être aussi une seule et même personne c'est-à-dire que quelqu'un peut avoir été victime et devenir auteur quelqu'un peut avoir été auteur et devenir victime encore."

      3. Critique de l'Expertise Actuelle

      L'intervenant dénonce la pratique actuelle qui consiste à confier l'expertise de l'auteur à un psychiatre et celle de la victime à un psychologue, alors que l'explication de la relation d'emprise nécessite une approche globale.

      • Scission de l'expertise : "on va confier l'expertise de l'auteur à un psychiatre... et on va confier l'expertise de la victime à une ou un psychologue."

      • Incohérence de la demande : "comment vous voulez expliciter une relation d'emprise si vous n'avez examiné qu'un seul [individu]?"

      • Recommandation : Il est "intéressant que ça soit le même professionnel ou alors une association de mêmes professionnels psychiatres et psychologues par exemple qui puissent examiner à la fois l'auteur et à la fois la victime."

      4. L'Expertise Psychiatrique : Constats et Limites

      L'expertise psychiatrique des auteurs d'emprise révèle des constats importants sur l'absence fréquente de troubles mentaux graves ou d'altération du discernement, mais met en évidence des fonctionnements de personnalité spécifiques.

      • Absence de diagnostics graves : "dans la plupart des cas les auteurs... n'ont pas de pathologie mentale avérée... on ne trouve pas non plus de trouble grave de la personnalité."

      • Discernement non aboli : "dans la plupart des cas il y a pas de notion d'altération il y a pas de notion d'abolition du discernement ou du contrôle de ces actes."

      • Dangerosité criminologique : L'évaluation se concentre sur le risque de réitération, en identifiant des facteurs de bon et mauvais pronostic.
      • Fonctionnements de personnalité identifiés :Obsessionnel : "ils sont dans le contrôle d'eux-même de leurs émotions de le contrôle de leur environnement."

      • Paranoïaque : Avec "suspicion d'infidélité tout ce qui est de l'ordre des interprétations tout ce qui est de l'ordre des projections."

      • Borderline : Caractérisé par une alternance "je fusionne je rejette je fusionne je rejette" et une "dimension abandonique".
      • Refus du terme "pervers narcissique" : Le terme "pervers" n'est pas référencé en psychiatrie et l'intervenant préfère décortiquer les mécanismes comme "la séduction, le déni d'altérité... la manipulation... la transgression."
      • Du côté de la victime, l'expertise cherche à établir un lien de cause à effet entre l'emprise et les troubles psychiques (dépression, anxiété).

      La "vulnérabilité" est entendue au sens médico-légal (mesure de protection), bien que des "fragilités" puissent être notées.

      Traumatisme développemental : Le cœur de la vulnérabilité réside dans le "trauma développemental", souvent lié à des "négligences" précoces. Inadaptation du système d'attachement : C'est la "clé de la relation d'emprise."

      5. Les Étapes de la Relation d'Emprise

      La relation d'emprise suit des phases distinctes, souvent schématisées pour en faciliter la compréhension juridique :

      • Séduction et adhésion initiale (Love bombing) : Compliments, cadeaux, affection intense, fausse empathie créant une "dépendance affective rapide" et une "lune de miel."
      • Confusion et culpabilisation : Introduction graduelle de comportements de contrôle, critiques déguisées, changements d'humeur imprévisibles, "brouillage cognitif" (ex: gaslighting "je ne t'ai jamais dit ça tu inventes"). La victime perd confiance en son jugement. La culpabilisation pour de prétendus manquements s'installe.
      • Isolement et contrôle : L'auteur isole la victime de ses proches. La victime s'isole elle-même par honte ou pour éviter les conflits, perdant ainsi ses repères extérieurs. Le contrôle se manifeste par la surveillance des faits et gestes, du téléphone, de l'argent.
      • Privation et menace : Privation d'affection, harcèlement, et enfin menaces directes ("si tu me quittes je te détruirai", "je me suiciderai", "tu perdras les enfants"), souvent le moment où la justice intervient.

      6. Le Rôle Fondamental de l'Attachement Désorganisé

      L'attachement est un lien affectif essentiel au développement humain. Un attachement sécure permet l'autorégulation, mais un attachement dysfonctionnel, notamment désorganisé, crée le terrain propice à l'emprise.

      • Définition de l'attachement : "un lien affectif et il est à la base c'est une nécessité pour le développement humain."
      • Lien avec l'emprise : "pour moi il n'y a pas d'emprise sans problématique d'attachement." L'attachement désorganisé est le type le plus propice.
      • Origine de l'attachement désorganisé : Figures d'attachement (souvent les parents) "incohérentes," "effrayantes ou effrayées," "sévèrement déprimées," "désaccordées," "avec des traumas non résolus," ou "maltraitantes ou négligentes." La négligence seule peut suffire.
      • Mécanismes : L'enfant est confronté à une "peur sans solution" et sa figure d'attachement est "incapable de le réguler." Il "désactive son système d'attachement" et développe des "stratégies de contrôle" pour rééquilibrer le dysfonctionnement.
      • Mécanismes de contrôle dans l'emprise : "contrôle prendre soin," "contrôle punitif," "contrôle séduction," "contrôle soumission."

      7. Le Fonctionnement Défensif de l'Auteur

      La relation d'emprise est un "fonctionnement défensif" pour l'auteur, qui tente de gérer une problématique interne en l'externalisant.

      • Externalisation : L'auteur "externalise sa propre problématique" et "rend l'autre responsable de ses propres actes et de ses propres défaillances."
      • Projection : "en les pointant chez l'autre ou en les projetant chez l'autre ça va lui permettre de ne pas regarder les siennes."
      • Peur de la projection : Paradoxalement, l'auteur "va avoir peur de ce qu'il voit" chez la victime, désignée comme un agresseur car elle porte la projection de sa propre agressivité.

      8. L'Évolution du Cadre Légal et Recommandation Finale

      La justice évolue, reconnaissant l'emprise sous le terme de "contrôle coercitif", mais sa démonstration reste un défi.

      • Vers le "contrôle coercitif" : "l'emprise ça existait pas donc on a on va appeler ça contrôle coercitif."
      • Démonstration légale : Il faut prouver l'intentionalité de l'agresseur, la perception négative du comportement par la victime, les dégâts causés, et l'existence de "menaces ouvertes" en cas de tentative d'échapper au contrôle.
      • Importance de l'exploration du lien : L'exploration de la "zone grise" entre victime et auteur "ne remet aucunement en question la position de victime et la position d'agresseur devant la loi... mais ça permet de comprendre et je pense que c'est cette compréhension qui permettra de faire avancer les choses."

      En conclusion, la conférence souligne l'impératif d'une approche intégrée en matière d'emprise, où la compréhension des mécanismes psychologiques, notamment ceux liés à l'attachement désorganisé, doit éclairer et enrichir l'action judiciaire, malgré les défis de traduction des concepts psychiatriques dans le langage juridique.

    1. Sx = {y | (x, y) ∈ P }

      According to the definition of a partition, overlapping parts are equal. So all of the \(S_x\) parts (there are infinite) that share at least one \(y\) will collapse into one part. The result is two subsets that compose \(\Omega.\) From what I can tell, in the context of partitions, \(S = \Omega.\)

    1. th: na school, c need to collab orate to build a team orking togethe to sol c the dile S y ty gy Clr el h: u th nature ar uali evidence € d q € e UW mmas 1n learnin to colle tiv snare and criti d q oO d that shows our 1 Pp t on stu Pp Pp q mpac $s dent learnin. and to cooperate in Jannin and critiqui 8) & 1 ng ? gs ? ? lessons learnin: intentions and S SC aona regu. b ucces: riter1, lar asis.

      Working together with other teachers, team members, and stake holders are essential in providing the best practices for a student. "It takes a village to raise a child," said Nancy Reagan, This could not be truer in this day and age. Students are in various places and positions around a school in a day. Get to know your students and who else is involved with them to provide them with the best of our school for their learning benefit in all areas of their life.

    1. Reviewer #2 (Public review):

      The authors have revised their manuscript in response to reviewer feedback, incorporating several modifications to improve clarity and provide additional supporting information. To address concerns about confusing terminology, they have standardized the reference to PRDM16 overexpressing cells as Prdm16_OE, clarifying its expression from a constitutive promoter. They also revised the text to resolve seemingly contradictory statements about ChP development in the mutant. New bioinformatic analysis comparing PRDM16 binding in E12.5 ChP cells to co-repressed versus BMP-only-repressed genes has been performed and included in Supplementary Figure 5C, providing a statistical assessment of PRDM16's regulatory role on co-repressed genes. Several figures were updated, including adding an illustration of the Prdm16 cGT allele to Figure 1B, providing a zoomed-in inset for Figure 1E, and including individual channels for Wnt2b and marking boundaries in Figure 7A. Full-view images and examples of spot segmentation for SCRINSHOT analysis are now available in a new supplementary figure, and the presentation of RT-qPCR data in Supplementary Figure 2B was improved by using separate graphs for overexpression samples to avoid a broken Y-axis. Furthermore, the authors have added more references to introductory statements, annotated structures like the ChP, CH, and fourth ventricle in figures, and clarified that the beta-Gal signal was used as a marker for mutant ChP cells in Figure 1D. Finally, the manuscript now includes a discussion of the recently published, related study by Hurwitz et al. (2023) in the discussion section, highlighting similarities and differences. Overall, the authors have satisfactorily addressed the reviewers' comments.

    2. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      This manuscript describes the role of PRDM16 in modulating BMP response during choroid plexus (ChP) development. The authors combine PRDM16 knockout mice and cultured PRDM16 KO primary neural stem cells (NSCs) to determine the interactions between BMP signaling and PRDM16 in ChP differentiation.

      They show PRDM16 KO affects ChP development in vivo and BMP4 response in vitro. They determine genes regulated by BMP and PRDM16 by ChIP-seq or CUT&TAG for PRDM16, pSMAD1/5/8, and SMAD4. They then measure gene activity in primary NSCs through H3K4me3 and find more genes are co-repressed than co-activated by BMP signaling and PRDM16. They focus on the 31 genes found to be co-repressed by BMP and PRDM16. Wnt7b is in this set and the authors then provide evidence that PRDM16 and BMP signaling together repress Wnt activity in the developing choroid plexus.

      Strengths:

      Understanding context-dependent responses to cell signals during development is an important problem. The authors use a powerful combination of in vivo and in vitro systems to dissect how PRDM16 may modulate BMP response in early brain development.

      We thank the reviewer for the thoughtful summary and positive feedback. We appreciate the recognition of our integrative in vivo and in vitro approach. We're glad the reviewer found our findings on context-dependent gene regulation and developmental signaling valuable.

      Main weaknesses of the experimental setup:

      (1) Because the authors state that primary NSCs cultured in vitro lose endogenous Prdm16 expression, they drive expression by a constitutive promoter. However, this means the expression levels are very different from endogenous levels (as explicitly shown in Supplementary Figure 2B) and the effect of many transcription factors is strongly dose-dependent, likely creating differences between the PRDM16-dependent transcriptional response in the in vitro system and in vivo.

      We acknowledge that our in vitro experiments may not ideally replicate the in vivo situation, a common limitation of such experiments, our primary aim was to explore the molecular relationship between PRDM16 and BMP signaling in gene regulation. Such molecular investigations are challenging to conduct using in vivo tissues. In vitro NSCs treated with BMP4 has been used a model to investigate NSC proliferation and quiescence, drawing on previous studies (e.g., Helena Mira, 2010; Marlen Knobloch, 2017). Crucially, to ensure the relevance of our in vitro findings to the in vivo context, we confirmed that cultured cells could indeed be induced into quiescence by BMP4, and this induction necessitated the presence of PRDM16. Furthermore, upon identifying target genes co-regulated by PRDM16 and SMADs, we validated PRDM16's regulatory role on a subset of these genes in the developing Choroid Plexus (ChP) (Fig. 7 and Suppl.Fig7-8). Only by combining evidence from both in vitro and in vivo experiments could we confidently conclude that PRDM16 serves as an essential co-factor for BMP signaling in restricting NSC proliferation.

      (2) It seems that the authors compare Prdm16_KO cells to Prdm16 WT cells overexpressing flag_Prdm16. Aside from the possible expression of endogenous Prdm16, other cell differences may have arisen between these cell lines. A properly controlled experiment would compare Prdm16_KO ctrl (possibly infected with a control vector without Prdm16) to Prdm16_KO_E (i.e. the Prdm16_KO cells with and without Prdm16 overexpression.)

      We agree that Prdm16 KO cells carrying the Prdm16-expressing vector would be a good comparison with those with KO_vector. However, despite more than 10 attempts with various optimization conditions, we were unable to establish a viable cell line after infecting Prdm16 KO cells with the Prdm16-expressing vector. The overall survival rate for primary NSCs after viral infection is low, and we observed that KO cells were particularly sensitive to infection treatment when the viral vector was large (the Prdm16 ORF is more than 3kb).

      As an alternative oo assess vector effects, we instead included two other control cell lines, wt and KO cells infected with the 3xNLS_Flag-tag viral vector, and presented the results in supplementary Fig 2.  When we compared the responses of the four lines — wt, KO, wt infected with the Flag vector, KO infected with the Flag vector — to the addition and removal of BMP4, we confirmed that the viral infection itself has no significant impacts on the responses of these cells to these treatments regarding changes in cell proliferation and Ttr induction.

      Given that wt cells and the KO cells, with or without viral backbone infection behave quite similarly in terms of cell proliferation, we speculate that even if we were successful in obtaining a cell line with Prdm16-expressing vector in the KO cells, it may not exhibit substantial differences compared to wt cells infected with Prdm16-expressing vector.

      Other experimental weaknesses that make the evidence less convincing:

      (1) The authors show in Figure 2E that Ttr is not upregulated by BMP4 in PRDM16_KO NSCs. Does this appear inconsistent with the presence of Ttr expression in the PRDM16_KO brain in Figure1C?

      The reviwer’s point is that there was no significant increase in Ttr expression in Prdm16_KO cells after BMP4 treatment (Fig. 2E), but there remained residule Ttr mRNA signals in the Prdm16 mutant ChP (Fig. 1C). We think the difference lies in the measuable level of Ttr expression between that induced by BMP4 in NSC culture and that in the ChP. This is based on our immunostaining expreriment in which we tried to detect Ttr using a Ttr antibody. This antibody could not detect the Ttr protein in BMP4-treated Prdm16_expressing NSCs but clearly showed Ttr signal in the wt ChP. This means that although Ttr expression can be significantly increased by BMP4 in vitro to a level measurable by RT-qPCR, its absolute quantity even in the Prdm16_expressing condition is much lower compared to that in vivo. Our results in Fig 1C and Fig 2E, as well as Fig 7B, all consistently showed that Prdm16 depletion significantly reduced Ttr expression in in vitro and in vivo.

      (2) Figure 3: The authors use H3K4me3 to measure gene activity. This is however, very indirect, with bulk RNA-seq providing the most direct readout and polymerase binding (ChIP-seq) another more direct readout. Transcription can be regulated without expected changes in histone methylation, see e.g. papers from Josh Brickman. They verify their H3K4me3 predictions with qPCR for a select number of genes, all related to the kinetochore, but it is not clear why these genes were picked, and one could worry whether these are representative.

      H3K4me3 has widely been used as an indicator of active transcription and is a mark for cell identity genes. And it has been demonstrated that H3K4me3 has a direct function in regulating transciption at the step of RNApolII pausing release. As stated in the text, there are advantages and disadvantages of using H3K4me3 compared to using RNA-seq. RNA-seq profiles all gene products, which are affected by transcription and RNA stability and turnover. In contrast, H3K4me3 levels at gene promoter reflects transcriptional activity. In our case, we aimed to identify differential gene expression between proliferation and quiescence states. The transition between these two states is fast and dynamic. RNA-seq may not be able to identify functionally relevant genes but more likely produces false positive and negative results. Therefore, we chose H3K4me3 profiling.

      We agree that transcription may change without histone methylation changes. This may cause an under-estimation of the number of changed genes between the conditions. 

      We validated 7 out of 31 genes (Wnt7b, Id3, Mybl2, Spc24, Spc25, Ndc80 and Nuf2). We chose these genes based on two critira: 1) their function is implicated in cell proliferation and cell-cycle regulation based on gene ontology analysis; 2) their gene products are detectable in the developing ChP based on the scRNA-seq data. Three of these genes (Wnt7b, Id3, Mybl2) are not related to the kinetochore. We now clarify this description in the revised text.

      (3) Line 256: The overlap of 31 genes between 184 BMP-repressed genes and 240 PRDM16-repressed genes seems quite small.

      This result indicates that in addition to co-repressing cell-cycle genes, BMP and PRDM16 have independent fucntions. For example, it was reported that BMP regulates neuronal and astrocyte differentiation (Katada, S. 2021), while our previous work demonstrated that Prdm16 controls temporal identity of NSCs (He, L. 2021).

      (4) The Wnt7b H3K4me3 track in Fig. 3G is not discussed in the text but it shows H3K4me3 high in _KO and low in _E regardless of BMP4. This seems to contradict the heatmap of H3K4me3 in Figure 3E which shows H3K4me3 high in _E no BMP4 and low in _E BMP4 while omitting _KO no BMP4. Meanwhile CDKN1A, the other gene shown in 3G, is missing from 3E.

      The track in Fig 3G shows the absolute signal of H3K4me3 after mapping the sequencing reads to the genome and normaliz them to library size. Compare the signal in Prdm16_E with BMP4 and that in Prdm16_E without BMP4, the one with BMP4 has a lower peak. The same trend can be seen for the pair of Prdm16_KO cells with or without BMP4.  The heatmap in Fig. 3E shows the relative level of H3K4me3 in three conditions. The Prdm16_E cells with BMP4 has the lowest level, while the other two conditions (Prdm16_KO with BMP4 and Prdm16_E without BMP4) display higher levels. These two graphs show a consistent trend of H3K4me3 changes at the Wnt7b promoter across these conditions. Figure 3E only includes genes that are co-repressed by PRDM16 and BMP. CDKN1A’s H3K4me3 signals are consistent between the conditions, and thus it is not a PRDM16- or BMP-regulated gene. We use it as a negative control. 

      (5) The authors use PRDM16 CUT&TAG on dissected dorsal midline tissues to determine if their 31 identified PRDM16-BMP4 co-repressed genes are regulated directly by PRDM16 in vivo. By manual inspection, they find that "most" of these show a PRDM16 peak. How many is most? If using the same parameters for determining peaks, how many genes in an appropriately chosen negative control set of genes would show peaks? Can the authors rigorously establish the statistical significance of this observation? And why wasn't the same experiment performed on the NSCs in which the other experiments are done so one can directly compare the results? Instead, as far as I could tell, there is only ChIP-qPCR for two genes in NSCs in Supplementary Figure 4D.

      In our text, we indicated the genes containing PRDM16 binding peaks in the figures and described them as “Text in black in Fig. 6A and Supplementary Fig. 5A”. We will add the precise number “25 of these genes” in the main text to clarify it. We used BMP-only repressed 184-31 =153 genes (excluding PRDM16-BMP4 co-repressed) as a negative control set of genes. By computationally determine the nearest TSS to a PRDM16 peak, we identified 24/31 co-repressed genes and 84/153 BMP-only-repressed genes, containing PRDM16 peaks in the E12.5 ChP data. Fisher’s Exact Test comparing the proportions yields the P-value = 0.015.

      We are confused with the second part of the comment “And why wasn't the same experiment performed on the NSCs in which the other experiments are done so one can directly compare the results? Instead, as far as I could tell, there is only ChIP-qPCR for two genes in NSCs in Supplementary Figure 4D.” If the reviewer meant why we didn’t sequence the material from sequential-ChIP or validate more taget genes, the reason is the limitation of the material. Sequential ChIP requires a large quantity of the antibodies, and yields little material barely sufficient for a few qPCR after the second round of IP. This yielded amount was far below the minimum required for library construction. The PRDM16 antibody was a gift, and the quantity we have was very limited. We made a lot of efforts to optimize all available commercial antibodies in ChIP and Cut&Tag, but none of them worked in these assays.

      (6) In comparing RNA in situ between WT and PRDM16 KO in Figure 7, the authors state they use the Wnt2b signal to identify the border between CH and neocortex. However, the Wnt2b signal is shown in grey and it is impossible for this reviewer to see clear Wnt2b expression or where the boundaries are in Figure 7A. The authors also do not show where they placed the boundaries in their analysis. Furthermore, Figure 7B only shows insets for one of the regions being compared making it difficult to see differences from the other region. Finally, the authors do not show an example of their spot segmentation to judge whether their spot counting is reliable. Overall, this makes it difficult to judge whether the quantification in Figure 7C can be trusted.

      In the revised manuscript we have included an individal channel of Wnt2b and mark the boundaries. We also provide full-view images and examples of spot segmentation in the new supplementary figure 8. 

      (7) The correlation between mKi67 and Axin2 in Figure 7 is interesting but does not convincingly show that Wnt downstream of PRDM16 and BMP is responsible for the increased proliferation in PRDM16 mutants.

      We agree that this result (the correlation between mKi67 and Axin2) alone only suggests that Wnt signaling is related to the proliferation defect in the Prdm16 mutant, and does not necessarily mean that Wnt is downstream of PRDM16 and BMP. Our concolusion is backed up by two additional lines of evidences:  the Cut&Tag data in which PRDM16 binds to regulatory regions of Wnt7b and Wnt3a; BMP and PRDM16 co-repress Wnt7b in vitro.

      An ideal result is that down-regulating Wnt signaling in Prdm16 mutant can rescue Prdm16 mutant phenotype. Such an experiment is technically challenging. Wnt plays diverse and essential roles in NSC regulation, and one would need to use a celltype-and stage-specific tool to down-regulate Wnt in the background of Prdm16 mutation. Moreover, Wnt genes are not the only targets regulated by PRDM16 in these cells, and downregulating Wnt may not be sufficient to rescue the phenotype. 

      Weaknesses of the presentation:

      Overall, the manuscript is not easy to read. This can cause confusion.

      We have revised the text to improve clarity.

      Reviewer #1 (Recommendations for the authors):

      (1) Overall, the manuscript is not easy to read. Here are some causes of confusion for which the presentation could be cleaned up:

      We are grateful for the reviewer’s suggestion. In the revised manuscript, we have made efforts to improve the clarity of the text.

      (a) Part of the first section is confusing in that some statements seem contradictory, in particular:

      "there is no overall patterning defect of ChP and CH in the Prdm16 mutant" (line 125)

      "Prdm16 depletion disrupted the transition from neural progenitors into ChP epithelia" (line 144)

      It would be helpful if the authors could reformulate this more clearly.

      We modified the text to clarify that while the BMP-patterned domain is not affected, the transition of NSCs into ChP epithelial cells is compromised in the Prdm16 mutant.

      (b) Flag_PRDM16, PRDM16_expressing, PRDM16_E, PRDM16 OE all seem to refer to the same PRDM16 overexpressing cells, which is very confusing. The authors should use consistent naming. Moreover, it would be good if they renamed these all to PRDM16_OE to indicate expression is not endogenous but driven by a constitutive promoter.

      We appreciate the comment and agree that the use of multiple terms to refer to the same PRDM16-overexpressing condition was confusing. Our original intention in using Prdm16_E was to distinguish cells expressing PRDM16 from the two other groups: wild-type cells and Prdm16_KO cells, which both lack PRDM16 protein expression. However, we acknowledge that Prdm16_E could be misinterpreted as indicating expression from the endogenous Prdm16 promoter. To avoid this confusion and ensure consistency, we have now standardized the terminology and refer to this condition as Prdm16_OE, indicating Flag-tagged PRDM16 expression driven by a constitutive promoter.

      (c) Line 179 states "generated a cell line by infecting Prdm16_KO cells with the same viral vector, expressing 3xNSL_Flag". Do the authors mean 3xNLS_Flag_Prdm16, so these are the Prdm16_KO_E cells by the notation suggested above? Or is this a control vector with Flag only? The following paragraph refers to Supplementary Figure 2C-F where the same construct is called KO_CDH, suggesting this was an empty CDH vector, without Flag, or Prdm16. This is confusing.

      We appreciate the reviewer’s careful reading and helpful comment. We acknowledge the confusion caused by the inconsistent terminology. To clarify: in line 179, we intended to describe an attempt to generate a Prdm16_KO cell line expressing 3xNLS_Flag_Prdm16, not a control vector with Flag only. However, despite repeated attempts, we were unable to establish this line due to low viral efficiency and the vulnerability of Prdm16_KO cells to infection with the large construct. Therefore, these cells were not included in the subsequent analyses.

      The term KO_CDH refers to Prdm16_KO cells infected with the empty CDH control vector, which lacks both Flag and Prdm16. This is the line used in the experiments shown in Supplementary Fig. 2C–F. We have revised the text throughout the manuscript to ensure consistent use of terminology and to avoid this confusion.

      (2) The introductory statements on lines 53-54 could use more references.

      Thanks for the suggestion. We have now included more references.

      (3) It would be helpful if all structures described in the introduction and first section were annotated in Figure 1, or otherwise, if a cartoon were included. For example, the cortical hem, and fourth ventricle.

      Thanks for the suggestion. We have now indicated the structures, ChP, CH and the fourth ventricle, in the images in Figure 1 and Supplementary Figure 1.

      (4) In line 115, "as previously shown.." - to keep the paper self-contained a figure illustrating the genetics of the KO allele would be helpful.

      Thanks for the suggestion. We have now included an illustration of the Prdm16 cGT allele in Figure 1B.

      (5) In Figure 1D as costain for a ChP marker would be helpful because it is hard to identify morphologically in the Prdm16 KO.

      Appoligize for the unclarity. The KO allele contains a b-geo reporter driven by Prdm16 endogenous promoter. The samples were co-stained for EdU, b-Gal and DAPI. To distingquish the ChP domain from the CH, we used the presence of b b-Gal as a marker. We indicated this in the figure legend, but now have also clarified this in the revised text.

      (6) The details in Figure 1E are hard to see, a zoomed-in inset would help.

      A zoomed-in inset is now included in the figure.

      (7) Supplementary Figure 2A does not convincingly show that PRDM16 protein is undetectable since endogenous expression may be very low compared to the overexpression PRDM16_E cells so if the contrast is scaled together it could appear black like the KO.

      We appreciate the reviewer’s point and have carefully considered this concern. We concluded that PRDM16 protein is effectively undetectable in cultured wild-type NSCs based on direct comparison with brain tissue. Both cultured NSCs and brain sections were processed under similar immunostaining and imaging conditions. While PRDM16 showed robust and specific nuclear localization in embryonic brain sections (Fig. 1B and Supplementary Fig. 1A), only a small subset of cultured NSCs exhibited PRDM16 signal, primarily in the cytoplasm (middle panel of Fig. 2A). This stark contrast supports our conclusion that endogenous PRDM16 protein is either absent or significantly downregulated in vitro. Because of this limitation, we turned to over-expressing Prdm16 in NSC culture using a constitutive promoter. 

      (9) Line 182 "Following the washout step" - no such step had been described, maybe replace by "After washout of BMP".

      Yes, we have revised the text.

      (8) Line 214: "indicating a modest level" - what defines modest? Compared to what? Why is a few thousand moderate rather than low? Does it go to zero with inhibitors for pathways?

      Here a modest level means a lower level than to that after adding BMP4. To clarify this, we revised the description to “indicating endogenous levels of …”

      (9) The way qPCR data are displayed makes it difficult to appreciate the magnitude of changes, e.g. in Supplementary Figure 2B where a gap is introduced on the scale. Displaying log fold change / relative CT values would be more informative.

      We used a segmented Y-axis in Supplementary Figure 2B because the Prdm16 overexpression samples exhibited much higher experssion levels compared to other conditions. In response to this suggestion, we explored alternative ways to present the result, including ploting log-transformed values and log fold changes. However, these methods did not enhance the clarity of the differences – in fact, log scaling made the magnitude of change appear less apparent. To address this, we now present the overexpression samples in a separate graph, thereby eliminating the need for a broken Y-axis and improving the overall readability of the data.

      (10) Writing out "3 days" instead of 3D in Figure 2A would improve clarity. It would be good if the used time interval is repeated in other figures throughout the paper so it is still clear the comparison is between 0 and 3 days.

      We have changed “3D” to “3 days”. All BMP4 treatments in this study were 3 days.

      (11) Line 290: "we found that over 50% of SMAD4 and pSMAD1/5/8 binding peaks were consistent in Prdm16_E and Prdm16_KO cells, indicating that deletion of Prdm16 does not affect the general genomic binding ability of these proteins" - this only makes sense to state with appropriate controls because 50% seems like a big difference, what is the sample to sample variability for the same condition? Moreover, the next paragraph seems to contradict this, ending with "This result suggests that SMAD binding to these sites depends on PRDM16". The authors should probably clarify the writing.

      We appreciate the reviwer’s comment and agree that clarification was needed. Our point was that SMAD4 and pSMAD1/5/8 retain the ability to bind DNA broadly in the Prdm16 KO cells, with more than half of the original binding sites still occupied. This suggests that deletion of Prdm16 does not globally impair SMAD genomic binding. Howerever, our primary interest lies in the subset of sites that show differential by SMAD binding between wt and Prdm16 KO conditions, as thse are likely to be PRDM16-dependent. 

      In the following paragraph, we focused specifically on describing SMAD and PRDM16 co-bound sites. At these loci, SMAD4 and pSMAD1/5/8 showed reduced enrichment in the absence of PRDM16, suggesting PRDM16 facilitates SMAD binding at these particular regions. We have revised the text in the manuscript to more clearly distinguish between global SMAD binding and PRDM16-dependent sites.

      (12) Much more convincing than ChIP-qPCR for c-FOS for two loci in Figures 5F-G would be a global analysis of c-FOS ChIP-seq data.

      We agree that a global c-FOS ChIP-seq analysis would provide a more comprehensive view of c-FOS binding patterns. However, the primary focus of this study is the interaction between BMP signaling and PRDM16. The enrichment of AP-1 motifs at ectopic SMAD4 binding sites was an unexpected finding, which we validated using c-FOS ChIP-qPCR at selected loci. While a genome-wide analysis would be valuable, it falls beyond the current scope. We agree that future studies exploring the interplay among SMAD4/pSMAD, PRDM16, and AP-1 will be important and informative.

      (13) Figure 6A is hard to read. A heatmap would make it much easier to see differences in expression. Furthermore, if the point is to see the difference between ChP and CH, why not combine the different subclusters belonging to those structures? Finally, why are there 28 genes total when it is said the authors are evaluating a list of 31 genes and also displaying 6 genes that are not expressed (so the difference isn't that unexpressed genes are omitted)?

      For the scRNA-seq data, we chose violin plots because they display both gene expression levels and the number of cells that express each gene. However, we agree that the labels in Figure 6A were too small and difficult to read. We have revised the figure by increasing the font size and moved genes with low expression to  Supplementary Figure 5A. Figure 6A includes 17 more highly expressed genes together with three markers, and  Supplementary Figure 5A contains 13 lowly expressed genes. One gene Mrtfb is missing in the scRNA-seq data and thus not included. We have revised the description of the result in the main text and figure legends.

      Reviewer #2 (Public review):

      Summary:

      This article investigates the role of PRDM16 in regulating cell proliferation and differentiation during choroid plexus (ChP) development in mice. The study finds that PRDM16 acts as a corepressor in the BMP signaling pathway, which is crucial for ChP formation.

      The key findings of the study are:

      (1) PRDM16 promotes cell cycle exit in neural epithelial cells at the ChP primordium.

      (2) PRDM16 and BMP signaling work together to induce neural stem cell (NSC) quiescence in vitro.

      (3) BMP signaling and PRDM16 cooperatively repress proliferation genes.

      (4) PRDM16 assists genomic binding of SMAD4 and pSMAD1/5/8.

      (5) Genes co-regulated by SMADs and PRDM16 in NSCs are repressed in the developing ChP.

      (6) PRDM16 represses Wnt7b and Wnt activity in the developing ChP.

      (7) Levels of Wnt activity correlate with cell proliferation in the developing ChP and CH.

      In summary, this study identifies PRDM16 as a key regulator of the balance between BMP and Wnt signaling during ChP development. PRDM16 facilitates the repressive function of BMP signaling on cell proliferation while simultaneously suppressing Wnt signaling. This interplay between signaling pathways and PRDM16 is essential for the proper specification and differentiation of ChP epithelial cells. This study provides new insights into the molecular mechanisms governing ChP development and may have implications for understanding the pathogenesis of ChP tumors and other related diseases.

      Strengths:

      (1) Combining in vitro and in vivo experiments to provide a comprehensive understanding of PRDM16 function in ChP development.

      (2) Uses of a variety of techniques, including immunostaining, RNA in situ hybridization, RT-qPCR, CUT&Tag, ChIP-seq, and SCRINSHOT.

      (3) Identifying a novel role for PRDM16 in regulating the balance between BMP and Wnt signaling.

      (4) Providing a mechanistic explanation for how PRDM16 enhances the repressive function of BMP signaling. The identification of SMAD palindromic motifs as preferred binding sites for the SMAD/PRDM16 complex suggests a specific mechanism for PRDM16-mediated gene repression.

      (5) Highlighting the potential clinical relevance of PRDM16 in the context of ChP tumors and other related diseases. By demonstrating the crucial role of PRDM16 in controlling ChP development, the study suggests that dysregulation of PRDM16 may contribute to the pathogenesis of these conditions.

      We thank the reviewer for the thorough and thoughtful summary of our study. We’re glad the key findings and significance of our work were clearly conveyed, particularly regarding the role of PRDM16 in coordinating BMP and Wnt signaling during ChP development. We also appreciate the recognition of our integrated approach and the potential implications for understanding ChP-related diseases.

      Weaknesses:

      (1) Limited investigation of the mechanism controlling PRDM16 protein stability and nuclear localization in vivo. The study observed that PRDM16 protein became nearly undetectable in NSCs cultured in vitro, despite high mRNA levels. While the authors speculate that post-translational modifications might regulate PRDM16 in NSCs similar to brown adipocytes, further investigation is needed to confirm this and understand the precise mechanism controlling PRDM16 protein levels in vivo.

      While mechansims controlling PRDM16 protein stability and nuclear localization in the developing brain are interesting, the scope of this paper is revealing the function of PRDM16 in the choroid plexus and its interaction with BMP signaling. We will be happy to pursuit this direction in our next study.

      (2) Reliance on overexpression of PRDM16 in NSC cultures. To study PRDM16 function in vitro, the authors used a lentiviral construct to constitutively express PRDM16 in NSCs. While this approach allowed them to overcome the issue of low PRDM16 protein levels in vitro, it is important to consider that overexpressing PRDM16 may not fully recapitulate its physiological role in regulating gene expression and cell behavior.

      As stated above, we acknowledge that findings from cultured NSCs may not directly apply to ChP cells in vivo. We are cautious with our statements. The cell culture work was aimed to identify potential mechanisms by which PRDM16 and SMADs interact to regulate gene expression and target genes co-regulated by these factors. We expect that not all targets from cell culture are regulated by PRDM16 and SMADs in the ChP, so we validated expression changes of several target genes in the developing ChP and now included the new data in Fig. 7 and Supplementary Fig. 7. Out of the 31 genes identified from cultured cells, four cell cycle regulators including Wnt7b, Id3, Spc24/25/nuf2 and Mybl2, showed de-repression in Prdm16 mutant ChP. These genes can be relevant downstream genes in the ChP, and other target genes may be cortical NSC-specific or less dependent on Prdm16 in vivo.

      (3) Lack of direct evidence for AP1 as the co-factor responsible for SMAD relocation in the absence of PRDM16. While the study identified the AP1 motif as enriched in SMAD binding sites in Prdm16 knockout cells, they only provided ChIP-qPCR validation for c-FOS binding at two specific loci (Wnt7b and Id3). Further investigation is needed to confirm the direct interaction between AP1 and SMAD proteins in the absence of PRDM16 and to rule out other potential co-factors.

      We agree that the finding of the AP1 motif enriched at the PRDM16 and SMAD co-binding regions in Prdm16 KO cells can only indirectly suggest AP1 as a co-factor for SMAD relocation. That’s why we used ChIP-qPCR to examine the presence of C-fos at these sites. Although we only validated two targets, the result confirms that C-fos binds to the sites only in the Prdm16 KO cells but not Prdm16_expressing cells, suggesting AP1 is a co-factor.  Our results cannot rule out the presence of other co-factors.

      Reviewer #2 (Recommendations for the authors):

      Minor typo: [7, page 3] "sicne" should be "since".

      We appreciate the reviewer’s careful reading. We have now corrected the typo and revised some part of the text to improve clarity.

      Reviewer #3 (Public review):

      Summary:

      Bone morphogenetic protein (BMP) signaling instructs multiple processes during development including cell proliferation and differentiation. The authors set out to understand the role of PRDM16 in these various functions of BMP signaling. They find that PRDM16 and BMP co-operate to repress stem cell proliferation by regulating the genomic distribution of BMP pathway transcription factors. They additionally show that PRDM16 impacts choroid plexus epithelial cell specification. The authors provide evidence for a regulatory circuit (constituting of BMP, PRDM16, and Wnt) that influences stem cell proliferation/differentiation.

      Strengths:

      I find the topics studied by the authors in this study of general interest to the field, the experiments well-controlled and the analysis in the paper sound.

      We thank the reviewer for their positive feedback and thoughtful summary. We appreciate the recognition of our efforts to define the role of PRDM16 in BMP signaling and stem cell regulation, as well as the soundness of our experimental design and analysis.

      Weaknesses:

      I have no major scientific concerns. I have some minor recommendations that will help improve the paper (regarding the discussion).

      We have revised the discussion according to the suggestions.

      Reviewer #3 (Recommendations for the authors):

      Specific minor recommendations:

      Page 18. Line 526: In a footnote, the authors point out a recent report which in parallel was investigating the link between PRDM16 and SMAD4. There is substantial non-overlap between these two papers. To aid the reader, I would encourage the authors to discuss that paper in the discussion section of the manuscript itself, highlighting any similarities/differences in the topic/results.

      Thanks for the suggestion. We now included the comparison in the discussion. One conclusion between our study and this publication is consistent, that PRDM16 functions as a co-repressor of SMAD4. However, the mechanims are different. Our data suggests a model in which PRDM16 facilitates SMAD4/pSMAD binding to repress proliferation genes under high BMP conditions. However, the other report suggests that SMAD4 steadily binds to Prdm16 promoter and switches regulatory functions depending on the co-factors. Together with PRDM16, SMAD4 represses gene expression, while with SMAD3 in response to high levels of TGF-b1, it activates gene expression. These differences could be due to different signaling (BMP versus TGF-b), contexts (NSCs versus Pancreatic cancers) etc.

      Page 3. Line 65: typo 'since'

      We appreciate the reviewer’s careful reading. We have now corrected the typo and revised the text to improve clarity.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Duilio M. Potenza et al. explores the role of Arginase II in cardiac aging, majorly using whole-body arg-ii knock-out mice. In this work, the authors have found that Arg-II exerts non-cell-autonomous effects on aging cardiomyocytes, fibroblasts, and endothelial cells mediated by IL-1b from aging macrophages. The authors have used arg II KO mice and an in vitro culture system to study the role of Arg II. The authors have also reported the cell-autonomous effect of Arg-II through mitochondrial ROS in fibroblasts that contribute to cardiac aging. These findings are sufficiently novel in cardiac aging and provide interesting insights. While the phenotypic data seems strong, the mechanistic details are unclear. How Arg II regulates the IL-1b and modulates cardiac aging is still being determined. The authors still need to determine whether Arg II in fibroblasts and endothelial contributes to cardiac fibrosis and cell death. This study also lacks a comprehensive understanding of the pathways modulated by Arg II to regulate cardiac aging.

      We sincerely appreciate the valuable feedback provided by the reviewer. It's gratifying to hear that our work provided novel information on the role of arginase-II in cardiac aging which is a complex process involving various cell types and mechanisms. We have devoted considerable effort by performing new experiments to address the reviewer's comments and to delineate more detailed mechanisms of Arg-II in cardiac aging. Please, see below our specific answers to each point of the reviewers.

      Strengths:

      This study provides interesting information on the role of Arg II in cardiac aging.

      The phenotypic data in the arg II KO mice is convincing, and the authors have assessed most of the aging-related changes.

      The data is supported by an in vitro cell culture system.

      We appreciate this reviewer’s positive assessment on the strength of our study.

      Weaknesses:

      The manuscript needs more mechanistic details on how Arg II regulates IL-1b and modulates cardiac aging.

      We made great effort and have performed new experiments in human monocyte cell line (THP1) in which iNOS is not expressed and not inducible by LPS and arg-ii gene was knocked out by CRISPR technology. Moreover, murine bone-marrow derived macrophages in which inos gene was ablated, is also use for this purpose. We found that in the human THP1 monocytes in which Arg-II but not iNOS is induced by LPS (100 ng/mL for 24 hours) (Suppl. Fig. 6A), mRNA and protein levels of IL-1b precursor are markedly reduced in arg-ii knockout THP1<sup>arg-ii<sup>-/-</sup></sup> as compared to the THP1<sup>wt</sup> cells (Suppl. Fig. 6B and 6C), further confirming that Arg-II promotes IL-1b production as also shown in RAW264.7 macrophages (Suppl. Fig. 5A and 5C). Moreover, in the mouse bone-marrow-derived macrophages, LPS-induced IL-1b production is inhibited by inos deficiency (BMDM<sup>inos-/-</sup> vs BMDM<sup>wt</sup>) (Suppl. Fig. 6D and 6E), while Arg-II levels are slightly enhanced in the BMDM<sup>inos-/-</sup> cells (Suppl. Fig. 6D and 6F). All together, these results suggest that iNOS slightly reduces Arg-II expression. Arg-II and iNOS can be upregulated by LPS independently. Both Arg-II and iNOS are required for IL-1b production upon LPS stimulation as illustrated in Suppl. Fig. 6G. For detailed results and discussion, please see answers to the comments point 2 or point 6 raised by this reviewer.

      The authors used whole-body KO mice, and the role of macrophages in cardiac aging is not studied in this model. A macrophage-specific arg II Ko would be a better model.

      We fully agree with this comment of the reviewer. Unfortunately, this macrophage specific arg-ii knockout animal model is not available, yet. Future research shall develop the macrophage-specific arg-ii<sup>-/-</sup> mouse model to confirm this conclusion with aging animals. Since Arg-II is also expressed in fibroblasts and endothelial cells and exerts cell-autonomous and paracrine functions, aging mouse models with conditional arg-ii knockout in the specific cell types would be the next step to elucidate cell-specific function of Arg-II in cardiac aging. We have pointed out this aspect for future research on page 19, lines 2 to 6.

      Experiments need to validate the deficiency of Arg II in cardiomyocytes.

      As pointed out by this reviewer in the comment point 10, Arg-II was previously reported to be expressed in isolated cardiomyocytes from in rats (PMID: 16537391). Unfortunately, negative controls. i.e., arg-ii<sup>-/-</sup> samples were not included in the study to avoid any possible background signals. We made great effort to investigate whether Arg-II is present in the cardiomyocytes from different species including mice, rats and humans and have included old arg-ii<sup>-/-</sup> mouse samples as a negative control. This allows to validate the antibody specificity and background noises beyond any reasonable doubt. The new experiments in Suppl. Fig. 4 confirms the specificity of the antibody against Arg-II in old mouse kidney which is known to express Arg-II in the S3 proximal tubular cells (Huang J, et al. 2021). To exclude the possible species-specific different expression of Arg-II in the cardiomyocytes, aged mouse and rat heart tissues were used for cellular localization of Arg-II by confocal immunofluorescence staining. As shown in Suppl. Fig. 4B and 4C, both species show Arg-II expression only in non-cardiomyocytes (cells between striated cardiomyocytes) (red arrows) but not in striated cardiomyocytes. Even in the rat myocardial infarction tissues, Arg-II was not found in cardiomyocytes but in endocardium cells (Suppl. Fig. 4B). In isolated cardiomyocytes exposed to hypoxia, a well know strong stimulus for Arg-II protein levels, no Arg-II signals could be detected, while in fibroblasts from the same animals, an elevated Arg-II levels under hypoxia is demonstrated (Fig. 5B). Furthermore, even RT-qPCR could not detect arg-ii mRNA in cardiomyocytes but in non-cardiomyocytes (Fig. 5C). All together, these results demonstrate that Arg-II are not expressed or at negligible levels in cardiomyocytes but expressed in non-cardiomyocytes. This new experiments with rat heart are included in the method section on page 20, the 1st paragraph. The results are described on page 7, the 1st paragraph, and discussed on page 12, the 2nd paragraph. Legend to Suppl. Fig. 4 is included in the file “Suppl. figure legend_R”.

      The authors have never investigated the possibility of NO involvement in this mice model.

      As above mentioned, we made great effort and have performed new experiments in human monocyte cell line (THP1) in which iNOS is not expressed and not inducible by LPS and arg-ii gene was knocked out by CRISPR technology. Moreover, murine bone-marrow derived macrophages in which inos gene was ablated, is also use for this purpose. The results show that Arg-II and iNOS can be upregulated by LPS independent of each other and iNOS slightly reduces Arg-II expression. However, both Arg-II and iNOS are required for IL-1b production upon LPS stimulation. For detailed results and discussion, please see answers to the comments point 2 or point 6 raised by this reviewer.

      A co-culture system would be appropriate to understand the non-cell-autonomous functions of macrophages.

      We appreciate the suggestion by this reviewer regarding the co-culture system to test the non-cell autonomous role of Arg-II. We think that our current model, which involves treating cells with conditioned media, is a well-established and effective method for demonstrating the non-cell autonomous role of Arg-II. This approach allows us to observe the effects of Arg-II on surrounding cells through the factors present in the conditioned media released from macrophages. The co-culture system could be considered, if the released factor in the conditioned medium is not stable. This is however not the case. Therefore, we are confident that our experimental model with conditioned medium is sufficiently enough to demonstrate a paracrine effect of cell-cell interaction (please also see answers to the comment point 16.

      The Myocardial infarction data shown in the mice model may not be directly linked to cardiac aging.

      As we have introduced and discussed in the manuscript, aging is a predominant risk factor for cardiovascular disease (CVD). Studies in experimental animal models and in humans provide evidence demonstrating that aging heart is more vulnerable to stressors such as ischemia/reperfusion injury and myocardial infarction as compared to the heart of young individuals. Even in the heart of apparently healthy individuals of old age, chronic inflammation, cardiomyocyte senescence, cell apoptosis, interstitial/perivascular tissue fibrosis, endothelial dysfunction and endothelial-mesenchymal transition (EndMT), and cardiac dysfunction either with preserved or reduced ejection fraction rate are observed. Our study is aimed to investigate the role of Arg-II in cardiac aging phenotype and age-associated cardiac vulnerability to stressors. Therefore, cardiac functional changes and myocardial infarction in response to ischemia/reperfusion injury are suitable surrogate parameters for the purpose.

      Reviewer #2 (Public Review):

      Summary:

      The results from this study demonstrated a cell-specific role of mitochondrial enzyme arginase-II (Arg-II) in heart aging and revealed a non-cell-autonomous effect of Arg-II on cardiomyocytes, fibroblasts, and endothelial cells through the crosstalk with macrophages via inflammatory factors, such as by IL-1b, as well as a cell-autonomous effect of Arg-II through mtROS in fibroblasts contributing to cardiac aging phenotype. These findings highlight the significance of non-cardiomyocytes in the heart and bring new insights into the understanding of pathologies of cardiac aging. It also provides new evidence for the development of therapeutic strategies, such as targeting the ArgII activation in macrophages.

      We're grateful for the reviewer's positive feedback, acknowledging the significant findings of our study on the role of arginase-II (Arg-II) in cardiac aging. We appreciate this reviewer’s insight into the therapeutic potential of targeting Arg-II activation in macrophages and are excited about the implications for future interventions in age-related cardiac pathologies. Thank you for recognizing the importance of our work in advancing our understanding of cardiac aging and potential therapeutic strategies.

      Strengths:

      This study targets an important clinical challenge, and the results are interesting and innovative. The experimental design is rigorous, the results are solid, and the representation is clear. The conclusion is logical and justified.

      We thank this reviewer for the positive comment.

      Weaknesses:

      The discussion could be extended a little bit to improve the realm of the knowledge related to this study.

      We appreciate this comment and have added and revised our discussion on this aspect accordingly at the end of the discussion section on page 19.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I have several critical concerns, specifically about the mechanism of how Arg-II plays a role in cardiac aging.

      My major concerns are:

      (1) The authors have shown non-cell-autonomous effects on aging cardiomyocytes, fibroblasts, and endothelial cells mediated by IL-1b from aging macrophages. A macrophage-specific Arg-II knock-out mouse model is a suitable and necessary control to establish claims.

      We fully agree with this comment of the reviewer. Unfortunately, this macrophage specific arg-ii knockout animal model is not available, yet. Future research shall develop the macrophage-specific arg-ii<sup>-/-</sup> mouse model to confirm this conclusion with aging animals. Since Arg-II is also expressed in fibroblasts and endothelial cells and exerts cell-autonomous and paracrine functions, aging mouse models with conditional arg-ii knockout in the specific cell types would be the next step to elucidate cell-specific function of Arg-II in cardiac aging. We have pointed out this aspect for future research on page 19, lines 2 to 6.

      (2) This study suggests that Arg-II exerts its effect through IL-1b in cardiac ageing. However, all experiments performed to demonstrate the link between ArgII and IL-1β are correlative at best. The underlying molecular mechanism, including transcription factors involved in the regulation of IL-1β by arg-ii, has not been demonstrated.

      We sincerely appreciate this reviewer’s comment on the aspect! To make it clear, a causal role of Arg-II in promoting IL-1β production in macrophages is evidenced by the experimental results showing that old arg-ii<sup>-/-</sup> mouse heart has lower IL-1β levels than the age-matched wt mouse heart (Fig. 6A to 6D). We further showed that the cellular IL-1β protein levels and release are reduced in old arg-ii<sup>-/-</sup> mouse splenic macrophages as compared to the wt cells (Fig. 7A, 7C, and 7D). This result is further confirmed with the mouse macrophage cell line RAW264.7 (Suppl. Fig. 5A and suppl. Fig. 5C), in which we demonstrate that silencing arg-ii reduces IL-1β levels stimulated with LPS.

      According to this reviewer’s comment (see comment point 6), we made further effort to investigate possible involvement of iNOS in Arg-II-regulated IL-1β production in macrophages stimulated with LPS. We performed new experiments in human monocyte cell line (THP1) in which iNOS is not expressed and not inducible by LPS and arg-ii gene was knocked out by CRISPR technology in the cells.

      Moreover, murine bone-marrow derived macrophages in which inos gene was ablated, is also use for this purpose. We found that in the human THP1 monocytes in which Arg-II but not iNOS is induced by LPS (100 ng/mL for 24 hours) (Suppl. Fig. 6A), mRNA and protein levels of IL-1b are markedly reduced in arg-ii knockout THP1<sup>arg-ii<sup>-/-</sup></sup> as compared to the THP1<sup>wt</sup> cells (Suppl. Fig. 6B and 6C), further confirming that Arg-II promotes IL-1b production as also shown in RAW264.7 macrophages (Suppl. Fig. 5A and 5C). The results suggest that Arg-II promotes IL-1b production independently of iNOS. Moreover, the role of iNOS in IL-1b production was also studied in the mouse bone-marrow-derived macrophages in which inos gene is ablated. The results demonstrate that LPS-induced IL-1b production is inhibited by inos deficiency (BMDM<sup>inos-/-</sup> vs BMDM<sup>wt</sup>) (Suppl. Fig. 6D and 6E), while Arg-II levels are slightly enhanced in the BMDM<sup>inos-/-</sup> cells (Suppl. Fig. 6D and 6F). Since arginase and iNOS share the same metabolic substrate L-arginine, <sup>inos-/-</sup> is expected to increase IL-1b production. This is however not the case. A strong inhibition of IL-1β production in <sup>inos-/-</sup> macrophages is observed. These results implicate that iNOS promotes IL-1β production independently of Arg-II and the inhibiting effect of IL-1β by inos deficiency is dominant and able to counteract Arg-II’s stimulating effect on IL-1β production. Hence, our results demonstrate that Arg-II promotes IL-1β production in macrophages independently of iNOS. All together, these results suggest that iNOS slightly reduces Arg-II expression. Arg-II and iNOS can be upregulated by LPS independently. Both Arg-II and iNOS are required for IL-1b production upon LPS stimulation (This concept is illustrated in the Suppl. Fig. 6G). The new results are described on page 8, the last paragraph and page 9, the 1st paragraph, presented in Suppl. Fig.6. The legend to Suppl. Fig. 6 is described in the file “Supplementary figure legend-R”. The related experimental methods are updated on page 23, the last two paragraphs and page 26 the last paragraph. The results are discussed o page 14, the last paragraph and page 15, the first two paragraphs.

      (3) Figure 2: The authors have not validated the whole-body Arg-II knock-out mice for arg-ii ablation.

      Thanks for pointing out this missing information! We have added the information regarding genotyping of the mice in the method section on page 20, first paragraph. Moreover, Fig. 5C also confirms the genotyping of the non-cardiomyocyte cells isolated from wt and arg-ii<sup>-/-</sup> animals.

      (4) It is unclear why the authors have chosen to focus on IL-1β specifically, among other pro-inflammatory cytokines that were also downregulated in Arg-II-/- mice as demonstrated in Fig. 2A-D.

      We appreciate the reviewer's question, which provides an opportunity to delve deeper into our findings. In our investigation, we observed that aging is accompanied by elevated levels of various proinflammatory markers. Intriguingly, our data revealed that tnf-α remained unaffected by the ablation of arg-ii during aging in the heart tissues, while Il-1β showed a significant reduction in arg-ii<sup>-/-</sup> animals compared to age-matched wild-type (wt) mice (Fig. 2). Mcp1 is however a chemoattractant for macrophages and F4-80 serves as a pan marker for macrophages. Moreover, our previous studies demonstrate a relationship between Arg-II and IL-1β in vascular disease and obesity and age-associated renal and pulmonary fibrosis. Finally, IL-1β has been shown to play a causal role in patients with coronary atherosclerotic heart disease as shown by CANTOS trials. Therefore, we have focused on IL-1β in this study. We have now explained and strengthened this aspect in the manuscript on page 7, the last two lines and page 8, the 1st paragraph as following:

      “Taking into account that our previous studies demonstrated a relationship of Arg-II and IL-1β in vascular disease and obesity (Ming et al., 2012) and in age-associated organ fibrosis such as renal and pulmonary fibrosis (Huang et al., 2021; Zhu et al., 2023), and IL-1β has been shown to play a causal role in patients with coronary atherosclerotic heart disease as shown by CANTOS trials (Ridker et al., 2017), we therefore focused on the role of IL-1β in crosstalk between macrophages and cardiac cells such as cardiomyocytes, fibroblasts and endothelial cells”.

      (5) Although macrophages are shown to be involved in cardiac ageing in the arg-ii mouse model, the authors have not estimated macrophage infiltration and expression of inflammatory or senescence markers in the hearts of these mice.

      Thank you very much for raising this important point! Taking the comments of the reviewer into account, we have performed new experiments, i.e., multiple immunofluorescent staining to analyze the infiltrated (CCR2<sup>+</sip>/F4-80<sup>+</sup>) and resident (LYVE1<sup>+</sup>/F4-80<sup>+</sup>) macrophage populations and to investigate to which extent that Arg-II affects the infiltrated and resident macrophage populations in the aging heart and whether this is regulated by arg-ii<sup>-/-</sup>. The results show an age-associated increase in the numbers of F4/80<sup>+</sup> cells in the wt mouse heart, which is reduced in the age-matched arg-ii<sup>-/-</sup> animals (Fig. 2G). This result is in accordance with the result of f4/80 gene expression shown in Fig. 2A, demonstrating that arg-ii gene ablation reduces macrophage accumulation in the aging heart. Interestingly, resident macrophages as characterized by LYVE1<sup>+</sup>/F4-80<sup>+</sup> cells (Fig. 2E and 2H) are predominant in the aging heart as compared to the infiltrated CCR2<sup>+</sup>/F4-80<sup>+</sup> cells (Fig. 2F and 2I). The increase in both LYVE1<sup>+</sup>/F4-80<sup>+</sup> and CCR2<sup>+</sup>/F4-80<sup>+</sup> macrophages in aging heart is reduced in arg-ii<sup>-/-</sup> mice (Fig. 2E, 2F, 2H, and 2I). These new results are described on page 6, the 1st paragraph, presented in Fig. 2E to 2I, and discussed on page 13, the 2nd, paragraph. The legend to Fig. 2 is revised. The method for this additional experiment is included on page 22, the 1st paragraph.

      Moreover, the aged-associated accumulation of the senescence cells as demonstrated by p16<sup>ink4</sup> positive cells is significantly reduced in arg-ii<sup>-/-</sup> animals. This new result is incorporated in the Fig. 1 as Fig. 1G and 1H and described / discussed on page 5, the 2nd paragraph and page 14, the 2nd last sentences of the 1st paragraph. The method of p16<sup>ink4</sup> staining is included in the method section on page 22, the 1st paragraph, line 7. The legend to Fig. 1 is revised accordingly.

      (6) Previously, Arg-II has been reported to serve a crucial role in ageing associated with reduced contractile function in rat hearts by regulating Nitric Oxide Synthase (PMID: 22160208). Elevated NO and superoxide have been shown to play crucial roles in the etiology of cardiovascular diseases (PMID: 24180388). Therefore, it is important to assess whether Nitric Oxide (NO) is involved in the aging-related phenotype in this mouse model.

      Following the reviewer's suggestion, we conducted new experiments to investigate the role of nitric oxide (NO) in the context of the effect of Arg-II-induced IL-1b production in macrophages. We have addressed this question in the response to the comment point 2.

      (7) Based on the results demonstrated in the study, ablation of Arg-II can be expected to cause a reduction in inflammation-associated phenotypes throughout the body at the multi-organ level. The observed improved cardiac phenotype could be an outcome of whole-body Arg-II ablation. It would be fruitful to develop a cardiac-specific Arg-II knockout mouse model to establish the role of Arg-II in the heart, independent of other organ systems.

      We agree with the comment of the reviewer on this point. Unfortunately, as explained above (see point 1), it is currently not possible for us to perform the requested experiments, due to lack of cardiac specific arg-ii-knockout mouse model. Moreover, such an approach is complicated by the absence of Arg-II in cardiomyocytes and the expression of Arg-II in multiple cells including endothelial cells, fibroblasts and macrophage of different origin (resident and monocyte-derived infiltrating cells). It’s thus difficult to generate a cardiac-specific gene knockout mouse. One shall investigate roles of cell-specific Arg-II in cardiac aging by generating cell-specific arg-ii<sup>-/-</sup> mice. We appreciate very this important aspect and have discussed issue on page 19, the lines 2 to 6.

      (8) Contrary to the findings in this paper, Arg-II has previously been reported to be essential for IL-10-mediated downregulation of pro-inflammatory cytokines, including IL-1β (PMID: 33674584).

      Thank you very much for mentioning this study! We have now discussed thoroughly the controversies as the following on page 15, the last paragraph and page 16, the 1st paragraph;

      “It is of note that a study reported that Arg-II is required for IL-10 mediated-inhibition of IL-1b in mouse BMDM upon LPS stimulation (Dowling et al., 2021), which suggests an anti-inflammatory function of Arg-II. The results of our present study, however, demonstrate that LPS enhances Arg-II and IL-1b levels in macrophages and knockout or silencing Arg-II reduces IL-1b production and release, demonstrating a pro-inflammatory effect of Arg-II. Our findings are supported by the study from another group, which shows decreased pro-inflammatory cytokine production including IL-6 and IL-1b in arg-ii<sup>-/-</sup> BMDM most likely through suppression of NFkB pathway, since arg-ii<sup>-/-</sup> BMDM reveals decreased activation of NFkB and IL-1b levels upon LPS stimulation (Uchida et al., 2023). Most importantly, our previous study also showed that re-introducing arg-ii gene back to the arg-ii<sup>-/-</sup> macrophages markedly enhances LPS-stimulated pro-inflammatory cytokine production (Ming et al., 2012), providing further evidence for a pro-inflammatory role of arg-ii under LPS stimulation. In support of this conclusion, chronic inflammatory diseases such as atherosclerosis and type 2 diabetes (Ming et al., 2012), inflammaging in lung (Zhu et al., 2023), kidney (Huang et al., 2021) and pancreas (Xiong, Yepuri, Necetin, et al., 2017) of aged animals or acute organ injury such as acute ischemic/reperfusion or cisplatin-induced renal injury are reduced in the arg-ii<sup>-/-</sup> mice (Uchida et al., 2023). The discrepant findings between these studies and that with IL-10 may implicate dichotomous functions of Arg-II in macrophages, depending on the experimental context or conditions. Nevertheless, our results strongly implicate a pro-inflammatory role of Arg-II in macrophages in the inflammaging in aging heart”.

      (9) The authors have only performed immunofluorescence-based experiments to show fibrotic and apoptotic phenotypes throughout this study. To verify these findings, we suggest that they additionally perform RT-PCR or western blotting analysis for fibrotic markers and apoptotic markers.

      The fibrotic aspect was analyzed not only by microscopy but also by using a quantitative biochemical assay such as hydroxyproline content assessment. Hydroxyproline is a major component of collagen and largely restricted to collagen. Therefore, the measurement of hydroxyproline levels can be used as an indicator of collagen content as previous investigated in the lung (Zhu et al., 2023). We have also measured collagen genes expression by RT-qPCR as suggested by the reviewer and found an age-related decline of collagen mRNA expression levels in both wt and arg-ii<sup>-/-</sup> mice, suggesting that the age-associated cardiac fibrosis and prevention in arg-ii<sup>-/-</sup> mice is due to alterations of translational and/or post-translational regulations, including collagen synthesis and/or degradation. The results are in accordance with that reported by other studies published in the literature. We have pointed out this aspect on page 5, the 2nd paragraph:

      “The increased cardiac fibrosis in aging is however, associated with decreased mRNA levels of collagen-Ia (col-Ia) and collagen-IIIa (col-IIIa), the major isoforms of pre-collagen in the heart (Suppl. Fig. 2A and 2B), which is a well-known phenomenon in cardiac fibrotic remodelling (Besse et al., 1994; Horn et al., 2016). The results demonstrate that age-associated cardiac fibrosis and prevention in arg-ii<sup>-/-</sup> mice is due to alterations of translational and/or post-translational regulations including collagen synthesis and/or degradation”.

      The results are presented in Suppl. Fig. 2, legend to Suppl. Fig. 2 is included in the file “Suppl. figure legend_R”. Suppl. table 2 for primers is revised accordingly.

      We did not use additional markers to perform apoptotic assays with whole heart, since Fig. 3 shows good evidence that the aging is associated with increased apoptotic cells in the heart and significantly reduced in the arg-ii<sup>-/-</sup> mice. The reduction of TUNEL positive (apoptotic) cells in aged arg-ii<sup>-/-</sup> mice is mainly due to decrease in apoptotic cardiomyocytes. With the histological analysis, the apoptotic cell types can be well analysed. Moreover, biochemical assay for apoptosis such as caspase-3 cleavage with whole heart tissues can not distinguish apoptotic cell types and may not be sensitive enough for aging heart, due to relatively low numbers of apoptotic cells in aging heart as compared to myocardial infarct model.  

      (10) Figure 4: arg-ii has previously been reported to be expressed in rat cardiomyocytes (PMID: 16537391). We strongly suggest the authors verify the expression of Arg-II via immunostaining in isolated cardiomyocytes (using published protocols), and by using multiple different cardiomyocyte-specific markers for colocalization studies to prove the lack of arg-ii expression beyond a reasonable doubt.

      As pointed out by this reviewer, Arg-II was previously reported to be expressed in isolated cardiomyocytes from in rats (PMID: 16537391). Unfortunately, negative controls. i.e., arg-ii<sup>-/-</sup> samples were not included in the study to avoid any possible background signals. We made great effort to investigate whether Arg-II is present in the cardiomyocytes from different species including mice, rats and humans and have included old arg-ii<sup>-/-</sup> mouse samples as a negative control. This allows to validate the antibody specificity and background noises beyond any reasonable doubt. The new experiments in Suppl. Fig. 4 confirms the specificity of the antibody against Arg-II in old mouse kidney which is known to express Arg-II in the S3 proximal tubular cells (Huang J, et al. 2021). To exclude the possible species-specific different expression of Arg-II in the cardiomyocytes, aged mouse and rat heart tissues were used for cellular localization of Arg-II by confocal immunofluorescence staining. As shown in Suppl. Fig. 4B and 4C, both species show Arg-II expression only in non-cardiomyocytes (cells between striated cardiomyocytes) (red arrows) but not in striated cardiomyocytes. Even in the rat myocardial infarction tissues, Arg-II was not found in cardiomyocytes but in endocardium cells (Suppl. Fig. 4B). In isolated cardiomyocytes exposed to hypoxia, a well know strong stimulus for Arg-II protein levels, no Arg-II signals could be detected, while in fibroblasts from the same animals, an elevated Arg-II levels under hypoxia is demonstrated (Fig. 5B). Furthermore, RT-qPCR could not detect arg-ii mRNA in cardiomyocytes but in non-cardiomyocytes (Fig. 5C). All together, these results demonstrate that Arg-II are not expressed or at negligible levels in cardiomyocytes but expressed in non-cardiomyocytes. This new experiments with rat heart are included in the method section on page 20, the 1st paragraph. The results are described on page 7, the 1st paragraph, and discussed on page 12, the 2nd paragraph. Legend to Suppl. Fig. 4 is included in the file “Suppl. figure legend_R”.

      (11) Figure 6G: It may be worthwhile to supplement arg-ii<sup>-/-</sup> old cells with IL-1beta to see if there is an increase in TUNEL-positive cells.

      IL-1b is a well known pro-inflammatory cytokine that causes apoptosis in various cell types including cardiomyocytes (Shen Y., et al., Tex Heart Inst J. 2015;42:109–116. doi: 10.14503/THIJ-14-4254; Liu Z. et. al., Cardiovasc Diabetol 2015;14,125. doi: 10.1186/s12933-015-0288-y; Li. Z., et al., Sci Adv 2020;6:eaay0589. doi: 10.1126/sciadv.aay0589). We appreciate very much the interesting idea of this reviewer to investigate the apoptotic responses of cardiomyocytes from arg-ii<sup>-/-</sup> mice to IL-1b. We agree that it is possible that cardiomyocytes from wt from arg-ii<sup>-/-</sup> mice react differently to IL-1b, although the cardiomyocytes do not express Arg-II as demonstrated in our present study. If this is true, it must be due to non-cell autonomous effects of different aging microenvironment in the heart or epigenetic modulations of the myocytes. We found that this is a very interesting aspect and requires further extensive investigation. Since our current study focused on the effect of wt and arg-ii<sup>-/-</sup> macrophages on cardiomyocytes and non-cardiomyocytes, we prefer not to include this suggested aspect in our manuscript and would like to explore it in the following study.

      (12) Figures 4-9: It would be interesting to see if the effect of ArgII in cardiac ageing is gender-specific. It is recommended to include experimental data with male mice in addition to the results demonstrated in female mice.

      As pointed out in the manuscript, we have focused on female mice, because an age-associated increase in arg-ii expression is more pronounced in females than in males (Fig. 1A). As suggested by this reviewer, we performed additional experiments investigating effects of arg-ii deficiency in male mice during aging, focusing on pathophysiological outcomes of ischemia/reperfusion injury in ex vivo experiments. The ex vivo functional analytic experiments with Langendorff system were performed in aged male mice (see Suppl. Fig. 9). Following ischemia/reperfusion injury, wt male mice display reduced left ventricular developed pressure (LVDP), as well as the inotropic and lusitropic states (expressed as dP/dt max and dP/dt min, respectively). As previously reported (Murphy et al., 2007), we also found that old male mice are more prone to I/R injury than age-matched female animals. Specifically, 15 minutes of ischemia are enough to significantly affect the left ventricle contractile function in the male mice (Suppl. Fig. 9). As opposite, age-matched old female mice are relatively resistant to I/R injury, and at least 20 min of ischemia are necessary to induce a significant impairment of the contractile function (Fig. 10). Similar to females, the post I/R recovery of cardiac function is also significantly improved in the male arg-ii<sup>-/-</sup> mice as compared to age-matched wt animals. In addition to functional recovery, triphenyl tetrazolium chloride (TTC) staining (myocardial infarction) upon I/R-injury in males is significantly reduced in the age-matched male arg-ii<sup>-/-</sup> animals (Suppl. Fig. 9C and 9D). All together, these results reveal a role for Arg-II in heart function impairment during aging in both genders with a higher vulnerability to stress in the males. These new results are presented in Suppl. Fig. 9, described on page 10, the last paragraph and page 11. The results are discussed on page 18, the 2nd paragraph as following:

      “The fact that aged females have higher Arg-II but are more resistant to I/R injury seems contradictory to the detrimental effect of Arg-II in I/R injury. It is presumable that cardiac vulnerability to injuries stressors depends on multiple factors/mechanisms in aging. Other factors/mechanisms associated with sex may prevail and determine the higher sensitivity of male heart to I/R injury, which requires further investigation. Nevertheless, the results of our study show that Arg-II plays a role in cardiac I/R injury also in males”.

      The information on the experimental methods in the male animals is included on page 20, the last paragraph and page 21, the 1st paragraph. Legend to Suppl. Fig. 9 is included in the file “Suppl. figure legend_R”.

      (13) Figure 6G: cardiomyocytes from wild-type mice, when treated with macrophages, show 0% TUNEL-positive cells. Since it is unlikely to obtain no TUNEL staining in a cell population, there may be an experimental or analytical error.

      Now it is Fig. 7F and 7G. This is due to our specific experimental procedure. After tissue digestion, cardiomyocytes were plated on laminin-coated dishes. Laminin promotes the adhesion of survived cells. Following plating, we conducted a deep washing process to remove damaged and partially adherent cells. This step ensures that only well-shaped, viable, and strongly adherent cells remain as bioassay cells. These “healthy” cells are then selected for the experiments. the apoptotic cells are removed by washing out, reflecting the high viability of the bioassay cells. We have added this detailed information in the method section on page 24, the 2nd paragraph.

      (14) Figure 7J: Please assess whether arg-ii depletion also affects the mtROS phenotype.

      According to the suggestion of this reviewer, we performed new experiments which show that human cardiac fibroblasts (HCFs) exposed to hypoxia (1% O<sub>2</sub>, 48 hours), a known physiological trigger of Arg-II up-regulation, exhibit increased mtROS generation, which involves Arg-II (new Fig. 8M to 8P). We found that Arg-II protein level as well as mtROS (assessed by mitoSOX staining) were both enhanced, accompanied by increased levels of HIF1α (Fig 8M). Moreover, mito-TEMPO pre-incubation reduces mtROS, confirming the mitochondrial origin of the ROS. Silencing of arg-ii with rAd-mediated shRNA, significantly reduces mtROS levels demonstrating a role of Arg-II in the production of mitochondrial ROS in cardiac fibroblasts (Fig 8M to 8P). We have included these results on page 9, the last paragraph and discussed the results on page 17, the 1st paragraph. The related method is described on page 26, the 2nd paragraph. Legend to Fig. 8 is updated on page 32.

      (15) Figure 8A-E: The authors have treated human-origin endothelial cells with mice-origin macrophage-conditioned media. It would be more suitable to treat the endothelial cells with human-origin macrophage-conditioned media.

      We acknowledge the concern regarding the use of mouse-origin macrophage-conditioned media on human-origin endothelial cells. It is to note, the biological cross-reactivity of cytokines from one species on cells from a different species has been reported in the literature. It was observed that there is quite a strict threshold of 60% amino acid identity, above which cytokines tend to cross-react and statistically, cytokines would tend to cross-react more often as their % amino acid identity increases (Scheerlinck JPY. Functional and structural comparison of cytokines in different species. Vet Immunol Immunopathol. 1999; 72:39-44. https://doi.org/10.1016/S0165-2427(99)00115-4). Taking IL-1b as an example, the 17.5 kDa mature mouse and human IL-1b share 92% aa sequence identity, suggesting a high cross-reactivity. Indeed, human IL-1b has shown biological cross-reactivity in mouse cells (Ledesma E., et al. Interleukin-1 beta (IL-1β) induces tumor necrosis factor alpha (TNF-α) expression on mouse myeloid multipotent cell line 32D cl3 and inhibits their proliferation. Cytokine. 2004; 26:66-72. https://doi.org/10.1016/j.cyto.2003.12.009). Moreover, our results also support the reported cross-reactivity between human and mouse IL-1b. The CM from mouse macrophage indeed showed biological function in human endothelial cells. The observed effects of the conditioned media from aged wild-type macrophages on endothelial cells were specifically mediated through IL-1β. This conclusion is supported by our data showing that the upregulation induced by the conditioned media was significantly reduced by the addition of an IL-1β receptor blocker.

      (16) The co-culture system would be more interesting to test the non-cell autonomous role of Arg II.

      We appreciate the suggestion by this reviewer regarding the co-culture system to test the non-cell autonomous role of Arg-II. We believe that our current model, which involves treating cells with conditioned media, is a well-established and effective method for demonstrating the non-cell autonomous role of Arg-II. This approach allows us to observe the effects of Arg-II on surrounding cells through the factors present in the conditioned media. The co-culture system could be considered, if the released factor in the conditioned medium is not stable. This is however not the case. So we are confident that our experimental model with conditioned medium is good enough to demonstrate a paracrine effect of cell-cell interaction.

      Reviewer #2 (Recommendations For The Authors):

      Some minor comments may be considered to improve the realm of the knowledge related to this study.

      We appreciate this comment and have added and revised our discussion on this aspect accordingly at the end of the discussion section on page 19, the last 6 lines.

      (1) The current study showed strong evidence demonstrating the key role of cardiac macrophages in pathologies of cardiac aging, particularly, the macrophages (MФ) from the circulating blood (hematogenous). It is known that the heart is among the minority of organs in which substantial numbers of yolk-sac MФ persist in adulthood and play a crucial role in maintaining cardiac function. Thus, the adult mammalian heart contains two separate and discrete cardiac MФ subgroups, i.e., the resident MФs originated from yolk sac-derived progenitors and the hematogenous MФs recruited from circulating blood monocytes. These two subtypes of MФs may play distinctive roles in the aging heart and the response to cardiac injury. The author could extend the discussion on the possibility of the resident MФs in aging hearts, which could be further investigated in the future.

      We appreciate the suggestion and agree that it provides valuable insight into the study. Taking the comments of the reviewer 1 into account, we have performed new experiments, i.e., co- immunostaining to analyze the infiltrated (CCR2<sup>+</sup>/F4-80<sup>+</sup>) and resident (LYVE1<sup>+</sup>/F4-80<sup>+</sup>) macrophage populations and to investigate to which extent that Arg-II affects infiltrated and resident macrophage populations in the aging heart. We found that in line with the gene expression of f4/80, immunofluorescence staining reveals an age-associated increase in the numbers of F4/80<sup>+</sup> cells in the wt mouse heart, which is reduced in the age-matched arg-ii<sup>-/-</sup> animals (Fig. 2E, F, G), demonstrating that arg-ii gene ablation reduces macrophage accumulation in the aging heart. Interestingly, resident macrophages as characterized by LYVE1<sup>+</sup>/F4-80<sup>+</sup> cells (Fig. 2E and 2H) are predominant in the aging heart as compared to the infiltrated CCR2<sup>+</sup>/F4-80<sup>+</sup> cells (Fig. 2F and 2I). The increase in both LYVE1<sup>+</sup>/F4-80<sup>+</sup> and CCR2<sup>+</sup>/F4-80<sup>+</sup> macrophages in aging heart is reduced in arg-ii<sup>-/-</sup> mice (Fig. 2E, 2F, 2H, and 2I). These new results are described on page 6, the 1st paragraph, presented in Fig. 2E to 2I, and discussed on page 13, the 2nd, paragraph. The legend to Fig. 2 is revised. The method for this additional experiment is included on page 22, the 1st paragraph.

      (2) It would be beneficial to the readers if the author could provide some explanation about why ArgII could not be detected in VSMCs in the mouse heart and the species difference between humans and mice. In addition, the author may provide an assumption on the possibility that there may also be a cross-talk between macrophages and VSMCs in the aging heart. A little bit more explanation in the Discussion will be helpful.

      We acknowledge and appreciate the suggestion and have discussed these points on page 19 as the following:

      “In this context, another interesting aspect is the cross-talk between macrophages and vascular SMC in the aging heart. In our present study, we could not detect Arg-II in vascular SMC of mouse heart but in that of human heart. This could be due to the difference in species-specific Arg-II expression in the heart or related to the disease conditions in human heart which is harvested from patients with cardiovascular diseases. Indeed, in the apoe<sup>-/-</sup> mouse atherosclerosis model, aortic SMCs do express Arg-II (Xiong et al., 2013). It is interesting to note that rodents hardly develop atherosclerosis as compared to humans. Whether this could be partly contributed by the different expression of Arg-II in vascular SMC between rodents and humans requires further investigation. In our present study, the aspect of the cross-talk between macrophages and vascular SMC is not studied. Since the crosstalk between macrophages and vascular SMC has been implicated in the context of atherogenesis as reviewed (Gong et al., 2025), further work shall investigate whether Arg-II expressing macrophages could interact with vascular SMC in the coronary arteries in the heart and contribute to the development of coronary artery disease and/or vascular remodelling and the underlying mechanisms“.

      (3) Please clarify the arrows in Figure 9C that indicate the infarct area in each splicing section from one heart.

      The arrows in Figure 9C (now Fig. 10C) are indeed utilized to indicate the sections displaying the infarcted area within each splicing section from one heart. We have explained the arrow in the figure legend (now Fig. 10 and also new Suppl. Fig. 9).

  2. resu-bot-bucket.s3.ca-central-1.amazonaws.com resu-bot-bucket.s3.ca-central-1.amazonaws.com
    1. powerful features for the clientsquickly and flexibly.

      이게 Result로 완전히 와닿지는 않는 것 같아. 나는 오히려 저 Powerful Feature 를 조금 더 설명하는게 더 나을 것 같다고 생각해.

      굳이 너가 만든 feature 가 아니라도 그게 왜 좋은지 어떻게 만들었는지를 대충 안다면 쓰는게 나는 더 이득이라고 생각해.

      예를 들어서

      Delivered X features to empower Y for clients, enabling RESULT

      ...로 일단 포맷을 바꿔보자

    1. mydata %>% filter(Sex.Code %in% c(1, 2)) %>% mutate(Sex.Code = as.factor(Sex.Code)) %>% ggplot(aes(x = Sex.Code, fill = District)) + geom_bar(position = "dodge") + facet_wrap(~ Species, scales = "free_y") + labs(title = "Distribution of Sex by Species and Management Area", x = "Sex", y = "Count") + theme_minimal()

      A legend for sex ID is needed here. Also facet by management area and species. This would be much clearer. Also change the factor levels to "Male", "Female", "Unknown".

    Annotators

    1. The Welsh 'll' is how we write the phoneme (sound) /ɬ/ which is called the voiceless lateral fricative. This sound is not a part of English phonology. In fact the only other European language which has it is Icelandic and then it's only found in clusters. Because English lacks the /ɬ/ sound, people who are unfamiliar with it often struggle to articulate it. Depending on where it is in a word, the English speaker will approximate it as /k/ before /l/ (klan for llan), or as /l/ in isolation (alan for allan) and sometimes /θl/ in medial position (Lanethli for Llanelli). People will always approximate a phoneme which is alien to them. Just as English speakers do not pronounce the French and German /y/ as /y/ but usually as something like /u/. Often phonemes like /x/ and /χ/ are realised as /k/ (e.g. lock for Scottish loch).

      https://www.reddit.com/r/learnwelsh/comments/1l8o22q/why_do_some_people_pronounce_llan_as_klan/

    1. Cuando Cherelles infectados con Mr alcanzaron la etapa tardía del marchitamiento, Mr respondió alterando la expresión de los genes Mr asociados con NTP (Bailey et al ., 2013 ). Cherelles infectados y marchitos no produjeron esporas de Mr , sino que fueron colonizados por microbios saprofito

      aca odria discutir que los cherelles de nuestro estudio se necrosaron y fueron cubiertos por la masa fungica

    2. Las esporas de Mr son de formas y tamaños variables y tienen un número variable de núcleos, siendo dos los más comunes (Díaz-Valderrama y Aime, 2016b ; Evans et al ., 2002 ). Inicialmente, se consideró que las esporas de Mr eran conidios producidos asexualmente (Evans et al ., 1978 ). Evans et al . ( 2002 ) encontraron posteriormente evidencia de una meiosis modificada. En un hallazgo reciente, Díaz-Valderrama y Aime ( 2016b ) informaron que la producción de esporas por Mr era de origen mitótico

      aqui habla sobre conidios y esporas y justifica que encontraron que mr se produce mediante mitosis

    3. La malformación depende de la edad de la mazorca en el momento de la infección, siendo las mazorcas de menos de 1 mes las más susceptibles, así como de la variedad de cacao: cuanto más joven sea la mazorca en el momento de la infección, mayor será el efecto en la expresión de los síntomas externamente

      esto me podria servir para justificar la infeccion en mazorcas jovenes y la expresion de los sintomas

    4. Sin embargo, quizás el registro más antiguo de la enfermedad se produce en la región de Antioquia en Colombia, que describe la destrucción de la producción de cacao en la década de 1850 por "un crecimiento virulento de hongo aterciopelado que se desarrolla hasta convertirse en un polvo impalpable y ataca solo la fruta

      aca describe a la masa fungica como un polvo impalpable

    5. Las reducciones de rendimiento varían del 50% al 90% para WBD (Meinhardt et al ., 2008 ) y del 10% al 100% para FPR (Phillips-Mora y Wilkinson, 2007 ).

      reducciones de la produccion

    6. Se ha informado que la FPR es dos veces más destructiva que la podredumbre negra de la mazorca y más peligrosa y difícil de controlar que la enfermedad de la escoba de bruja

      esto me puede servir para mencionar que es mas detructiva que mazorca negra y escoba de bruja

    7. Las infecciones iniciales son asintomáticas, salvo por la inflamación del tejido en algunos casos.

      Util puesto que habla de asintomatico y me puede servir para discutir mis resultados

    1. Author response:

      The following is the authors’ response to the current reviews.

      We wanted to clarify Reviewer #1’s latest comment in the last round of review, “Furthermore, the referee appreciates that the authors have echoed the concern regarding the limited statistical robustness of the observed scrambling events.” We appreciate the follow up information provided from Reviewer #1 that their comment is specifically about the low count alternative pathway events that we view at the dimer interface, and not the statistics of the manuscript overall as they believe that “the study presents a statistically rigorous analysis of lipid scrambling events across multiple structures and conformations (Reviewer #1)”. We agree with the Reviewer and acknowledge that overall our coarse-grained study represents the most comprehensive single manuscript of the entire TMEM16 family to date.


      The following is the authors’ response to the original reviews.

      Public Review:

      Reviewer #1 (Public review):

      Summary:

      The manuscript investigates lipid scrambling mechanisms across TMEM16 family members using coarse-grained molecular dynamics (MD) simulations. While the study presents a statistically rigorous analysis of lipid scrambling events across multiple structures and conformations, several critical issues undermine its novelty, impact, and alignment with experimental observations.

      Critical issues:

      (1) Lack of Novelty:

      The phenomenon of lipid scrambling via an open hydrophilic groove is already well-established in the literature, including through atomistic MD simulations. The authors themselves acknowledge this fact in their introduction and discussion. By employing coarse-grained simulations, the study essentially reiterates previously known findings with limited additional mechanistic insight. The repeated observation of scrambling occurring predominantly via the groove does not offer significant advancement beyond prior work.

      We agree with the reviewer’s statement regarding the lack of novelty when it comes to our observations of scrambling in the groove of open Ca2+-bound TMEM16 structures. However, we feel that the inclusion of closed structures in this study, which attempts to address the yet unanswered question of how scrambling by TMEM16s occurs in the absence of Ca2+, offers new observations for the field. In our study we specifically address to what extent the induced membrane deformation, which has been theorized to aid lipids cross the bilayer especially in the absence of Ca2+, contributes to the rate of scrambling (see references 36, 59, and 66). There are also several TMEM16F structures solved under activating conditions (bound to Ca2+ and in the presence of PIP2) which feature structural rearrangements to TM6 that may be indicative of an open state (PDB 6P48) and had not been tested in simulations. We show that these structures do not scramble and thereby present evidence against an out-of-the-groove scrambling mechanism for these states. Although we find a handful of examples of lipids being scrambled by Ca2+-free structures of TMEM16 scramblases, none of our simulations suggest that these events are related to the degree of deformation.

      (2) Redundancy Across Systems:

      The manuscript explores multiple TMEM16 family members in activating and non-activating conformations, but the conclusions remain largely confirmatory. The extensive dataset generated through coarse-grained MD simulations primarily reinforces established mechanistic models rather than uncovering fundamentally new insights. The effort, while statistically robust, feels excessive given the incremental nature of the findings.

      Again, we agree with the reviewer’s statement that our results largely confirm those published by other groups and our own. We think there is however value in comparing the scrambling competence of these TMEM16 structures in a consistent manner in a single study to reduce inconsistencies that may be introduced by different simulation methods, parameters, environmental variables such as lipid composition as used in other published works of single family members. The consistency across our simulations and high number of observed scrambling events have allowed us to confirm that the mechanism of scrambling is shared by multiple family members and relies most obviously on groove dilation.

      (3) Discrepancy with Experimental Observations:

      The use of coarse-grained simulations introduces inherent limitations in accurately representing lipid scrambling dynamics at the atomistic level. Experimental studies have highlighted nuances in lipid permeation that are not fully captured by coarse-grained models. This discrepancy raises questions about the biological relevance of the reported scrambling events, especially those occurring outside the canonical groove.

      We thank the reviewer for bringing up the possible inaccuracies introduced by coarse graining our simulations. This is also a concern for us, and we address this issue extensively in our discussion. As the reviewer pointed out above, our CG simulations have largely confirmed existing evidence in the field which we think speaks well to the transferability of observations from atomistic simulations to the coarse-grained level of detail. We have made both qualitative and quantitative comparisons between atomistic and coarse-grained simulations of nhTMEM16 and TMEM16F (Figure 1, Figure 4-figure supplement 1, Figure 4-figure supplement 5) showing the two methods give similar answers for where lipids interact with the protein, including outside of the canonical groove. We do not dispute the possible discrepancy between our simulations and experiment, but our goal is to share new nuanced ideas for the predicted TMEM16 scrambling mechanism that we hope will be tested by future experimental studies.

      (4) Alternative Scrambling Sites:

      The manuscript reports scrambling events at the dimer-dimer interface as a novel mechanism. While this observation is intriguing, it is not explored in sufficient detail to establish its functional significance. Furthermore, the low frequency of these events (relative to groove-mediated scrambling) suggests they may be artifacts of the simulation model rather than biologically meaningful pathways.

      We agree with the reviewer that our observed number of scrambling events in the dimer interface is too low to present it as strong evidence for it being the alternative mechanism for Ca2+-independent scrambling. This will require additional experiments and computational studies which we plan to do in future research. However, we are less certain that these are artifacts of the coarse-grained simulation system as we observed a similar event in an atomistic simulation of TMEM16F.

      Conclusion:

      Overall, while the study is technically sound and presents a large dataset of lipid scrambling events across multiple TMEM16 structures, it falls short in terms of novelty and mechanistic advancement. The findings are largely confirmatory and do not bridge the gap between coarse-grained simulations and experimental observations. Future efforts should focus on resolving these limitations, possibly through atomistic simulations or experimental validation of the alternative scrambling pathways.

      Reviewer #2 (Public review):

      Summary:

      Stephens et al. present a comprehensive study of TMEM16-members via coarse-grained MD simulations (CGMD). They particularly focus on the scramblase ability of these proteins and aim to characterize the "energetics of scrambling". Through their simulations, the authors interestingly relate protein conformational states to the membrane's thickness and link those to the scrambling ability of TMEM members, measured as the trespassing tendency of lipids across leaflets. They validate their simulation with a direct qualitative comparison with Cryo-EM maps.

      Strengths:

      The study demonstrates an efficient use of CGMD simulations to explore lipid scrambling across various TMEM16 family members. By leveraging this approach, the authors are able to bypass some of the sampling limitations inherent in all-atom simulations, providing a more comprehensive and high-throughput analysis of lipid scrambling. Their comparison of different protein conformations, including open and closed groove states, presents a detailed exploration of how structural features influence scrambling activity, adding significant value to the field. A key contribution of this study is the finding that groove dilation plays a central role in lipid scrambling. The authors observe that for scrambling-competent TMEM16 structures, there is substantial membrane thinning and groove widening. The open Ca2+-bound nhTMEM16 structure (PDB ID 4WIS) was identified as the fastest scrambler in their simulations, with scrambling rates as high as 24.4 {plus minus} 5.2 events per μs. This structure also shows significant membrane thinning (up to 18 Å), which supports the hypothesis that groove dilation lowers the energetic barrier for lipid translocation, facilitating scrambling.

      The study also establishes a correlation between structural features and scrambling competence, though analyses often lack statistical robustness and quantitative comparisons. The simulations differentiate between open and closed conformations of TMEM16 structures, with open-groove structures exhibiting increased scrambling activity, while closed-groove structures do not. This finding aligns with previous research suggesting that the structural dynamics of the groove are critical for scrambling. Furthermore, the authors explore how the physical dimensions of the groove qualitatively correlate with observed scrambling rates. For example, TMEM16K induces increased membrane thinning in its open form, suggesting that membrane properties, along with structural features, play a role in modulating scrambling activity.

      Another significant finding is the concept of "out-of-the-groove" scrambling, where lipid translocation occurs outside the protein's groove. This observation introduces the possibility of alternate scrambling mechanisms that do not follow the traditional "credit-card model" of groove-mediated lipid scrambling. In their simulations, the authors note that these out-of-the-groove events predominantly occur at the dimer interface between TM3 and TM10, especially in mammalian TMEM16 structures. While these events were not observed in fungal TMEM16s, they may provide insight into Ca2+-independent scrambling mechanisms, as they do not require groove opening.

      Weaknesses:

      A significant challenge of the study is the discrepancy between the scrambling rates observed in CGMD simulations and those reported experimentally. Despite the authors' claim that the rates are in line experimentally, the observed differences can mean large energetic discrepancies in describing scrambling (larger than 1kT barrier in reality). For instance, the authors report scrambling rates of 10.7 events per μs for TMEM16F and 24.4 events per μs for nhTMEM16, which are several orders of magnitude faster than experimental rates. While the authors suggest that this discrepancy could be due to the Martini 3 force field's faster diffusion dynamics, this explanation does not fully account for the large difference in rates. A more thorough discussion on how the choice of force field and simulation parameters influence the results, and how these discrepancies can be reconciled with experimental data, would strengthen the conclusions. Likewise, rate calculations in the study are based on 10 μs simulations, while experimental scrambling rates occur over seconds. This timescale discrepancy limits the study's accuracy, as the simulations may not capture rare or slow scrambling events that are observed experimentally and therefore might underestimate the kinetics of scrambling. It's however important to recognize that it's hard (borderline unachievable) to pinpoint reasonable kinetics for systems like this using the currently available computational power and force field accuracy. The faster diffusion in simulations may lead to overestimated scrambling rates, making the simulation results less comparable to real-world observations. Thus, I would therefore read the findings qualitatively rather than quantitatively. An interesting observation is the asymmetry observed in the scrambling rates of the two monomers. Since MARTINI is known to be limited in correctly sampling protein dynamics, the authors - in order to preserve the fold - have applied a strong (500 kJ mol-1 nm-2) elastic network. However, I am wondering how the ENM applies across the dimer and if any asymmetry can be noticed in the application of restraints for each monomer and at the dimer interface. How can this have potentially biased the asymmetry in the scrambling rates observed between the monomers? Is this artificially obtained from restraining the initial structure, or is the asymmetry somehow gatekeeping the scrambling mechanism to occur majorly across a single monomer? Answering this question would have far-reaching implications to better describe the mechanism of scrambling.

      The main aim of our computational survey was to directly compare all relevant published TMEM16 structures in both open and closed states using the Martini 3 CGMD force field. Our standardized simulation and analysis protocol allowed us to quantitatively compare scrambling rates across the TMEM16 family, something that has never been done before. We do acknowledge that direct comparison between simulated versus experimental scrambling rates is complicated and is best to be interpreted qualitatively. In line with other reports (e.g., Li et al, PNAS 2024), lipid scrambling in CGMD is 2-3 orders of magnitude faster than typical experimental findings. In the CG simulation field, these increased dynamics due to the smoother energy landscape are a well known phenomenon. In our view, this is a valuable trade-off for being able to capture statistically robust scrambling dynamics and gain mechanistic understanding in the first place, since these are currently challenging to obtain otherwise. For example, with all-atom MD it would have been near-impossible to conclude that groove openness and high scrambling rates are closely related, simply because one would only measure a handful of scrambling events in (at most) a handful of structures.

      Considering the elastic network: the reviewer is correct in that the elastic network restrains the overall structure to the experimental conformation. This is necessary because the Martini 3 force field does not accurately model changes in secondary (and tertiary) structure. In fact, by retaining the structural information from the experimental structures, we argue that the elastic network helped us arrive at the conclusion that groove openness is the major contributing factor in determining a protein’s scrambling rate. This is best exemplified by the asymmetric X-ray structure of TMEM16K (5OC9), in which the groove of one subunit is more dilated than the other. In our simulation, this information was stored in the elastic network, yielding a 4x higher rate in the open groove than in the closed groove, within the same trajectory.

      Notably, the manuscript does not explore the impact of membrane composition on scrambling rates. While the authors use a specific lipid composition (DOPC) in their simulations, they acknowledge that membrane composition can influence scrambling activity. However, the study does not explore how different lipids or membrane environments or varying membrane curvature and tension, could alter scrambling behaviour. I appreciate that this might have been beyond the scope of this particular paper and the authors plan to further chase these questions, as this work sets a strong protocol for this study. Contextualizing scrambling in the context of membrane composition is particularly relevant since the authors note that TMEM16K's scrambling rate increases tenfold in thinner membranes, suggesting that lipid-specific or membrane-thickness-dependent effects could play a role.

      Considering different membrane compositions: for this study, we chose to keep the membranes as simple as possible. We opted for pure DOPC membranes, because it has (1) negligible intrinsic curvature, (2) forms fluid membranes, and (3) was used previously by others (Li et al, PNAS 2024). As mentioned by the reviewer, we believe our current study defines a good, standardized protocol and solid baseline for future efforts looking into the additional effects of membrane composition, tension, and curvature that could all affect TMEM16-mediated lipid scrambling.

      Reviewer #3 (Public review):

      Strengths:

      The strength of this study emerges from a comparative analysis of multiple structural starting points and understanding global/local motions of the protein with respect to lipid movement. Although the protein is well-studied, both experimentally and computationally, the understanding of conformational events in different family members, especially membrane thickness less compared to fungal scramblases offers good insights.

      We appreciate the reviewer recognizing the value of the comparative study. In addition to valuable insights from previous experimental and computational work, we hope to put forward a unifying framework that highlights various TMEM16 structural features and membrane properties that underlie scrambling function.

      Weaknesses:

      The weakness of the work is to fully reconcile with experimental evidence of Ca²⁺-independent scrambling rates observed in prior studies, but this part is also challenging using coarse-grain molecular simulations. Previous reports have identified lipid crossing, packing defects, and other associated events, so it is difficult to place this paper in that context. However, the absence of validation leaves certain claims, like alternative scrambling pathways, speculative.

      Answer: It is generally difficult to quantitatively compare bulk measurements of scrambling phenomena with simulation results. The advantage of simulations is to directly observe the transient scrambling events at a spatial and temporal resolution that is currently unattainable for experiments. The current experimental evidence for the precise mechanism of Ca2+-independent scrambling is still under debate. We therefore hope to leverage the strength of MD and statistical rigor of coarse-grained simulations to generate testable hypotheses for further structural, biochemical, and computational studies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The findings are largely confirmatory and do not bridge the gap between coarse-grained simulations and experimental observations. Future efforts should focus on resolving these limitations, possibly through atomistic simulations or experimental validation of the alternative scrambling pathways.

      While we agree with what the reviewer may be hinting at regarding limitations of coarse-grained MD simulations, we believe that our study holds much more merit than this comment suggests. We have provided something that has yet to be done in the field: a comprehensive study that directly compares the scrambling rates of multiple TMEM16 family members in different conformations using identical simulation conditions. Our work clearly shows that a sufficiently dilated grooves is the major structural feature that enables robust scrambling for all TMEM16 scramblases members with solved structures. While all TMEM16s cause significant distortion and thinning of the membrane, we assert that the extreme thinning observed around open grooves is significantly enhanced by the lipid scrambling itself as the two leaflets merge through lipid exchange.  We saw no evidence that membrane thinning/distortion alone, in the absence of an open groove, could support scrambling at the rates observed under activating conditions or even the low rates observed in Ca2+-independent scrambling. Moreover, our handful of observations of scrambling events outside of the groove, which has not yet been reported in any study, opens an exciting new direction for studying alternative scrambling mechanisms. That said, we are currently following up on many of the observations reported here such as: scrambling events outside the groove, the kinetics of scrambling, the possibility that lipids line the groove of non-scramblers like TMEM16A, etc. This is being done experimentally with our collaborators through site directed mutagenesis and with all-atom MD in our lab. Unfortunately, it is well beyond the scope of the current study to include all of this in the current paper.

      Reviewer #2 (Recommendations for the authors):

      Major comments and questions:

      (1) Line 214 and Figure 1- Figure Supplement 1: why have you only compared the final frame of the trajectory to the cryo-EM structure? Even if these comparisons are qualitative, they should be representative of the entire trajectory, not a single frame.

      We thank the reviewer for this suggestion and replaced the single-frame snapshots in Figure 1-figure supplement 1 for ensemble-averaged head groups densities. The overall agreement between membrane shapes in CGMD and cryo-EM was not affected by this change.

      (2) Lines 228-231: You comment 'Residues in this site on nhTMEM16 and TMEMF also seem to play a role in scrambling but the mechanism by which they do so is unclear.' This is something you could attempt to quantify in the simulations by calculating the correlation between scrambling and protein-membrane interactions/contacts in this site. Can you speculate on a mechanism that might be a contributing factor?

      We probed the correlation between these residues and scrambling lipids, as suggested by the reviewer, and interestingly not all scrambling lipids interact with these residues. Yet there is strong lipid density in this vicinity (see insets in Figure 1 and Figure 4-figure supplement 2). These observations lead us to suspect these residues impact scrambling indirectly through influencing the conformation of the protein or flexibility and shape of the membrane. This interpretation fits with mutagenesis studies highlighting a role for these residues in scrambling (see refs 59, 62, and 67). Specifically, Falzone et al. 2022 (ref 59) suggested that they may thin the membrane near the groove, but this has not been tested via structure determination and a detailed model of how they impact scrambling is missing. We could address this question with in silico mutations; however, CG simulation is not an appropriate method to study large scale protein dynamics, and AA simulations are likely best, but beyond the scope of this paper.

      (3) Lines 240-245 and Figure 1B: This section discusses the coupling between membrane distortions and the sinusoidal curve around the protein, however, Figure 1B only shows snapshots of the membrane distortions. Is it possible to understand how these two collective variables are correlated quantitatively (as opposed to the current qualitative analysis)?

      We believe that it may be possible to quantitatively capture these two key features of the membrane, as we did previously with nhTMEM16 using our continuum elasticity-based model of the membrane (Bethel and Grabe 2016). Our model agreed with all atom MD surfaces to within ~1 Å, hence showing good quantitative agreement throughout the entire membrane. However, we doubt that we could distill the essence of our model down to a simple functional relationship between the sinusoidal wave and pinching, which we think the reviewer is asking. Rather, we believe that the large-scale sinusoidal distortion (collective variable 1) and pinching/distortion (collective variable 2) near the groove arise from the interplay of the specific protein surface chemistry for each protein (patterning of polar and non-polar residues) and the membrane. This is why we chose to simply report the distinct patterns that the family members impose on the surrounding membrane, which we think is fascinating. Specifically, Fig. 1B shows that different TMEM16 family members distort the membrane in different ways. Most notably, fungal TMEM16s feature a more pronounced sinusoidal deformation, whereas the mammalian members primarily produce local pinching. Then, in Fig. 3A we show that the thinning at the groove happens in all structures and is more pronounced in open, scrambling-competent conformations. In other words, proteins can show very strong thinning (e.g. TMEM16K, 5OC9) even though the membrane generally remains flat.

      (4) Lines 257-258: Authors comment that TMEM16A lacks scramblase activity yet can achieve a fully lipid-lined groove (note the typo - should be lipid-lined, not lipid-line). Is a fully lipid-lined groove a prerequisite for scramblase activity? Are lipid-lined grooves the only requirement for scramblase activity? Could the authors clarify exactly what the prerequisite for scramblase activity is to avoid any confusion; this will be useful for later descriptions (i.e. line 295) where scrambling competence is again referred to. Additionally, the associated figure panel (Figure 1D) shows a snapshot of this finding but lacks any statistical quantifications - is a fully lipid-lined groove a single event? Perhaps the additional analyses, such as the groove-lipid contacts, may be useful here.

      The definition of lipid scrambling is that a lipid fully transitions from one membrane leaflet to the other. While a single lipid could transition through the groove on its own, it is well documented in both atomistic and CG MD simulations, that lipid scrambling typically happens through a lipid-lined groove, as shown in Fig. 1A-B. The lipids tend to form strong choline-to-phosphate interactions with nearest neighbors that make this energetically favorable. That said, lipid-lined grooves are not sufficient for robust scrambling, which is what we show in Fig. 1D where the non-scrambler TMEM16A did in fact feature a lipid-lined groove. As suggested, we performed contact analysis and found that residue K645 on TM6 in the middle of the groove contacts lipids in 9.2% of the simulation frames.

      To get a better understanding of how populated the TM4-TM6 pathway is with lipids across all simulated structures, we determined for every simulation frame how many headgroup beads resided in the groove. This indicates that the ion-conductive state of TMEM16A (5OYB*, Fig. 1D) only had 1 lipid in the pathway, on average, meaning that the configuration shown Fig. 1D is indeed exceptional. As a reference, our strongest scrambler nhTMEM16 4WIS, had an average of 2.8 lipids in the groove. We added a table containing the means and standard deviations that resulted from this analysis as Figure 1-Table supplement 1.

      (5) Lines 295-298 : The scrambling rates of the Ca²⁺-bound and Ca²⁺-free structures fall within overlapping error margins, it becomes difficult to definitively state that Ca²⁺ binding significantly enhances scrambling activity. This undermines the claim that the Ca²⁺-bound structure is the strongest scrambler. The authors should conduct statistical analyses to determine if the difference between the two conditions is statistically significant.

      In contrast to the reviewer’s comment, we do not claim that Ca2+-binding itself enhances lipid scrambling. Instead, what we show is that WT structures that are solved in an open confirmation (all of which are Ca2+-bound, except 6QM6) are robust scramblers. For nhTMEM16, we did not observe any scrambling events for the closed-groove proteins, making further statistical analysis redundant.

      (6) The authors claim that the scrambling rates derived from their MD simulations are in "excellent agreement" with experimental findings (lines 294-295), despite significant discrepancy between simulated and experimentally measured rates. For example, the simulated rate of 24.4 {plus minus} 5.2 events/µs for the open, Ca²⁺-bound fungal nhTMEM16 (PDB ID 4WIS) corresponds to approximately 24 million events per second, which is vastly higher than experimental rates. Experimental studies have reported scrambling rate constants of ~0.003 s⁻¹ for TMEM16 family members in the absence of Ca²⁺, measured under physiological conditions (https://doi.org/10.1038/s41467-019-11753-1 ). Even with Ca²⁺ activation, scrambling rates remain several orders of magnitude lower than the rates observed in simulations. Moreover, this highlights a larger problem: lipid scrambling rates occur over timescales that are not captured by these simulations. While the authors elude to these discrepancies (lines 605-606), they should be emphasised in the text, as opposed to the table caption. These should also be reconducted to differences between the membrane compositions of different studies.

      We agree with the spirit of the reviewer’s comment, and because of that, we were very careful not to claim that we reproduce experimental scrambling rates, just that the trends (scrambling-competent, or not) are correct. On lines 294-295, we actually said that the scrambling rates in our simulations excellently agree with “the presumed scrambling competence of each experimental structure”, which is true. 

      As explained extensively in the discussion section of our paper (and by many others), direct comparison between MD (e.g., Martini 3, but also atomistic force fields) dynamics and experimental measurements is challenging. The primary goal of our paper is to quantify and compare the scrambling capacity of different TMEM16 family members and different states, within a CGMD context.

      That said, we agree with the reviewer that we may have missed rare or long-timescale events (as is the case in any MD experiment) and added this point to the discussion.

      (7) To address these discrepancies, the authors should: i) emphasize that simulated rates serve as qualitative indicators of scrambling competence rather than absolute values comparable to experimental findings and ii) discuss potential reasons for the divergence, such as simulation timescale limitations or lipid bilayer compositions that may favor scrambling and force field inaccuracies.

      Please see our answer to question 6. Within the context of our CGMD survey, we confidently call our results quantitative. However, we agree with the reviewer that comparison with experimental scrambling rates is qualitative and should be interpreted with caution. To reflect this, we rewrote the first sentence of the relevant paragraph in the discussion section.

      (8) Line 310: Can the authors provide a rationale as to why one monomer has a wider groove than the other? Perhaps a contact analysis could be useful. See the comment above about ENM.

      The simulation of Ca2+-bound TMEM16K was initiated from an asymmetric X-ray structure in which chain B features a more dilated groove than chain A (PDB 5OC9). The backbones of TM4 and TM6 in the closed groove (A) are close enough together to be directly interconnected by the elastic network. In contrast, TM4 and TM6 in the more dilated subunit (B) are not restricted by the elastic network and, as a consequence, display some “breathing” behavior (Fig. 3B and Fig. 3-Suppl. 6A), giving rise to a ~4x higher scrambling rate. We explicitly added the word “cryo-EM” and the PDB ID to the sentence to emphasize that the asymmetry stems from the original experimental structure.

      When answering this question, we also corrected a mislabeled chain identifier which was in the original manuscript ‘chain A’ when it is actually ‘chain B’ in Fig.2-Suppl. 3A.

      (9) Line 312: Authors speculate that increased groove width likely accounts for increased scrambling rates. For statistical significance, authors should attempt to correlate scrambling rates and groove width over the simulation period.

      The Reviewer is referring to our description of scrambling rates we measured for TMEM16K where we noted that on average the groove with the highest scrambling rate is also on average wider than the opposite subunit which is below 6 Å. We do not suggest that the correlation between scrambling and groove width is continuous, as the Reviewer may have interpreted from our original submission, but we think it is a binary outcome – lipids cannot easily enter narrow grooves (< 6 Å) and hence scrambling can only occur once this threshold is reached at which point it occurs at a near constant rate. We showed this for 4 different family members in the original Fig. 3B, where scrambling events (black dots) were much more likely during, or right after, groove dilation to distances > 6 Å. 

      (10) Line 359: Authors have plotted the minimum distance between residues TM4 and TM6 in Fig. 3A/B, claiming that a wide groove is required for scrambling. Upon closer examination, it is clear that several of these distributions overlap, reducing the statistical significance of these claims. Statistical tests (i.e. KS-tests) should be performed to determine whether the differences in distributions are significant.

      The Reviewer appears to be asking for a statistical test between the six distance distributions represented by the data in Fig. 3A for the scrambling competent structures (6QP6*, 8B8J, 6QM6, 7RXG, 4WIS, 5OC9), and we think this is being asked because it is believed that we are making a claim that the greater the distance, the greater the scrambling rate. If we have interpreted this comment correctly, we are not making this claim. Rather, we are simply stating that we only observe robust scrambling when the groove width regularly separates beyond 6 Å. The full distance distributions can now be found in Figure 3-figure supplement 6B, and we agree there is significant overlap between some of these distributions. However, the distinguishing characteristic of the 6 distributions from scrambling competent proteins is that they all access large distances, while the others do not. Notably, TMEM16F proteins (6QP6*, 8B8J) are below the 6 Å threshold on average, but they have wide standard deviations and spend well over ¼ of their time in the permissive regime (the upper error bar in the whisker plots in Fig. 3A is the 75% boundary).

      (11) Line 363-364: The authors state that all TMEM16 structures thin the membrane. Could the authors include a description of how membrane thinning is calculated, for instance, is the entire membrane considered, or is thinning calculated on a membrane patch close to the protein? Do membrane patches closer to the transmembrane protein increase or decrease thickness due to hydrophobic packing interactions? The latter question is of particular concern since Martini3 has been shown to induce local thinning of the membrane close to transmembrane helices, yielding thicknesses 2-3 Å thinner than those reported experimentally (https://doi.org/10.1016/j.cplett.2023.140436). This could be an important consideration in the authors' comparison to the bulk membrane thickness (line 364). Finally, how is the 'bulk membrane thickness' measured (i.e., from the CG simulations, from AA simulations, or from experiments)?

      Regarding the calculation of thinning and bulk membrane thickness, as described in Method “Quantification of membrane deformations”, the minimal membrane thickness, or thinning, is defined as the shortest distance between any two points from the interpolated upper and lower leaflet surfaces constructed using the glycerol beads (GL1 and GL2). Bulk membrane thickness is calculated by taking the vertical distance between the averaged glycerol surfaces at the membrane edge.

      The concern of localized membrane deformation due to force field artifacts is well-founded. However, the sinusoidal deformations shown here are much greater than 2-3 Å Martini3 imperfections, and they extend for up to 10 Å radially away from the protein into the bulk membrane (see Figure 3-figure supplement 1-5 for more of a description). Most importantly, the sinusoidal wave patterns set up by the proteins is very similar to those described in the previous continuum calculation and all-atom MD for nhTMEM16 (https://www.pnas.org/doi/full/10.1073/pnas.1607574113).

      (12) Line 374: The authors state a 'positive correlation' between membrane thinning/groove opening and scrambling rates. To support this claim, the authors should report. the correlation coefficients.

      We have removed any discussion concerning correlations between the magnitude of the scrambling rate and the degree of membrane thinning/groove opening. Rather we simply state that opening beyond a threshold distance is required for robust scrambling, as shown in our analysis in Fig. 3A.

      Concerning the relation between thinning and scrambling: Instantaneous membrane thinning is poorly defined (because it is governed by fluctuations of single lipids), and therefore difficult to correlate with the timing of individual scrambling events in a meaningful way.  Moreover, as we state later in that same section, “we argue that the extremely thin membranes are likely correlated with groove opening, rather than being an independent contributing factor to lipid scrambling”.

      (13) Line 396: It is stated that TMEM16A is not a scramblase but the simulating scrambling activity is not zero. How can you be sure that you are monitoring the correct collective variable if you are getting a false positive with respect to experiments?

      We only observe 2 scrambling events in 10 ms, which is a very small rate compared to the scrambling competent states. In a previous large survey Martini CG simulation study that inspired our protocol (Li et al, PNAS 2024), they employed a 1 event/ms cut-off to distinguish scramblers from non-scramblers. Hence, they would have called TMEM16A a non-scrambler as well. We expect that false negatives in this context might be an artifact of the CG forcefield, or it could be that TMEM16A can scramble but too slowly to be experimentally detected. Regarding the collective variable for lipid flipping, it is correct, and we know that this lipid actually flipped.

      (14) Line 402: Distance distributions for the electrostatic interactions between E633 and K645 should be included in the manuscript. This is also the case for the interactions between E843-K850 (lines 491-492).

      Our description of interactions between lipid headgroups and E633 and K645 in TMEM16A (5OYB*) are based on qualitative observations of the MD trajectory, and we highlight an example of this interaction in Figure 3-video 4. The video clearly shows that the lipid headgroups in the center of the groove orient themselves such that the phosphate bead (red) rests just above K645 (blue) and at other times the choline bead (blue) rests just below E633 (red). We do not think an additional plot with the distance distributions between lipids and these residues will add to our understanding of how lipids interact residues in the TMEM16A pore.

      We made a similar qualitative observation for the interaction between the POPC choline to E843 and POPC phosphate to K850 while watching the AAMD simulation trajectory of TMEM16F (PDB ID 6QP6). Given that this was a single observation, and the same interactions does not appear in CG simulation of the same structure (see simulation snapshots in Figure 4-figure supplement 5) we do not think additional analysis would add significantly to our understanding of which residues may stabilize lipids in the dimer interface.

      (15) Lines 450-451: 'As the groove opens, water is exposed to the membrane core and lipid headgroups insert themselves into the water-filled groove to bridge the leaflets.' Is this a qualitative observation? Could the authors report the correlation between groove dilation and the number of water permeation events?

      Yes, this is qualitative, and it sketches the order of events during scrambling, and we revised the main text starting at line 450 to indicate this. As illustrated by the density isosurfaces in Appendix 1-Figure 2A, the amount of water found in the closed versus open grooves is striking – there is a significant flood of water that connects the upper and lower solutions upon groove opening. Moreover, Appendix 1-Figure 2B shows much greater water permeation for open structures (4WIS, 7RXG, 5OC9, 8B8J, …) compared to closed structures (6QMB, 6QMA, 8B8Q, and many of the non-labeled data in the figure that all have closed grooves and near 0 water permeation). A notable exception is TMEM16A (7ZK3*8), which has water permeation but a closed groove and little-to-no lipid scrambling.

      Minor Comments:

      (1) Inconsistent use of '10' and 'ten' throughout.

      We like to kindly point out that we do not find examples of inconsistent use.

      (2) Line 32: 'TM6 along with 3, 4 and 5...' should be 'TM6 along with TM3, TM4 and TM5...'. Same in line 142. Naming should stay consistent.

      Changes are reflected in the updated manuscript.

      (3) Line 141: do you mean traverse (i.e. to travel across)? Or transverse (i.e. to extend across the membrane)?

      This is a typo. We meant “traverse”. Thanks for pointing it out.

      (4) Line 142: 'greasy' should be 'strongly hydrophobic'.

      Changes are reflected in the updated manuscript.

      (5) Line 143-144: "credit card mechanism" requires quotation marks.

      Changes are reflected in the updated manuscript.

      (6) Line 144: state if Nectria haematococca is mammalian or fungal, this is not obvious for all readers.

      Changes are reflected in the updated manuscript.

      (7) Line 147-148: Is TMEM16A/TMEM16K fungal or mammalian? What was the residue before the mutation and which residue is mutated? Perhaps the nomenclature should read as TMEM16X10Y where X=the residue prior to the mutation, 10 is a placeholder for the residue number that is mutated and Y=the new residue following mutation.

      “TMEM16” is the protein family. “A” denotes the specific homolog rather than residue.  

      (8) Lines 157-158: same as 10, it is unclear if these are fungal or mammalian.

      Clarifications added.

      (9) Line 184: "...CGMD simulation" should be "...CGMD simulations".

      Changes made.

      (10) Line 191-192: It would help to create a table of all of the mutants (including if they are mammalian or fungal) summarizing the salt concentrations, lipid and detergent environments, the presence of modulators/activators, etc.

      We added this information to Appendix 1-Table 1 in the supplemental information. We did not specify NaCl concentrations, because they all experimental procedures used standard physiological values for this (100-150 mM).

      (11) Line 210: inconsistencies with 'CG' and 'coarse-grain'.

      Changes made.

      (12) Figure 1 caption: '...totaling ~2μs (B)...' is missing the fullstop after 2μs.

      Changes made.

      (13) Figure 1B: it may be useful to label where the Ca2+ ion binds or include a schematic.

      We updated Fig. 1A to illustrate where Ca2+ binds.

      (14) Line 311: Are these mean distances? The authors should add standard deviations.

      Yes, they are. We added the standard deviations to the text.

      (15) Line 321-322: Perhaps a schematic in Figure 2 would be useful to visualize the structural features described here.

      We would kindly refer interested readers to reference [60].

      (16) Line 377: '...are likely a correlate of groove opening...' should read as: '...are likely correlated to groove opening...'.

      Thank you for pointing it out. Changes made.

      (17) Line 398: the '...empirically determined 6Å threshold for scrambling.' Was this determined from the simulations or from experiments? What does "empirically" mean here? Please state this.

      This value was determined from the simulations. Based on our analysis of the correlation between scrambling rate and groove dilation, we found that the minimal TM4/6 distance of 6 Å can distinguish between the high and low activity scramblers. The exact numerical value is somewhat arbitrary as there is a range of values around 6 Å that serve to distinguish scramblers from non-scramblers.

      (18) Figure 4: This figure should be labelled as A, B, C and D, with the figure caption updated accordingly.

      We updated Figure 4 and its caption.

      Reviewer #3 (Recommendations for Authors):

      The authors must do additional simulations to further validate their claim with different lipids and further substantiate dimer interface independent of Ca2+ ions.

      Thank you for the suggestion. We completely agree that studying scrambling in the context of a diverse lipid environment is an exciting area to explore. We are indeed actively working on a project that shares the similar idea. We decided not to include that study because we think the additional discussion involved would be excessive for the current manuscript. We, however, look forward to publishing our findings in a separate manuscript in the near future. In terms of Ca2+-independent scrambling, we are planning with our experimental collaborator for mutagenesis studies that target the residues we identified along the dimer interface.

      Since calcium ions are critical for the stability of these structures, authors should show that they were placed throughout the simulations consistently.

      As stated in the method section “Coarse-grained system preparation and simulation detail”, all Ca2+ ions are manually placed into the coarse-grained structure from the beginning of the simulation at their identical corresponding position in the experimental structure and harmonically bonded to adjacent acidic residues throughout the duration of simulation. We have also added a label to Fig 1A to indicate where the two Ca2+ ions are located.

      The comparison with experimental structures should be consistent with complete simulation, and not the last structure of the trajectory. Depending on the conformational variability, this might be misleading.

      We agree and updated Fig. 1-supplement figure 1 accordingly. The overall agreement between membrane shapes in CGMD and cryo-EM was not affected by this change.

    1. A closed interval, denoted by square brackets, means the endpoints are included in the interval. For example, [50,150][50,150][50, 150] in Table 11.1 is a closed interval with endpoints 50 and 150. There are 7 population units with biomass (Mg/ha) that fall within this interval. A closed interval can also be written using ≤≤\le notation, e.g., [50,150]={50≤y≤150}[50,150]={50≤y≤150}[50, 150]=\{50 \le y \le 150\}. An open interval, denoted by parentheses, means the endpoints are not included in the interval. Following from open and closed notation, a half-open interval includes only one of its endpoints, and is denoted by one parentheses and one square bracket, e.g., there are 8 units that fall within the half-open interval (150,250](150,250](150,250] or {150<y≤250}{150<y≤250}\{150 < y \le 250\}.

      I think this was mentioned in a previous chapter, maybe when building stand and stock tables. Perhaps it's a good idea to mention it again, but I thought I'd point it out.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Evidence, reproducibility and clarity

      In their manuscript de las Mercedes Carro et al investigated the role of Ago proteins during spermatogenesis by producing a triple knockout of Ago 1, 3 and 4. They first describe the pattern of expression of each protein and of Ago2 during the differentiation of male germ cells, then they describe the spermatogenesis phenotype of triple knockout males, study gene deregulation by scRNA seq and identify novel interacting proteins by co-IP mass spectrometry, in particular BRG1/SMARCA4, a chromatin remodeling factor and ATF2 a transcription factor. The main message is that Ago3 and 4 are involved in the regulation of XY gene silencing during meiosis, and also in the control of autosomal gene expression during meiosis. Overall the manuscript is well written, the topic, very interesting and the experiments, well-executed. However, there are some parts of the methodology and data interpretation that are unclear (see below).

      Major comments

      1= Please clarify how the triple KO was obtained, and if it is constitutive or specific to the male germline. In the result section a Cre (which cre?) is mentioned but it is not mentioned in the M&M. On Figure S1, a MICER VECTOR is shown instead of a deletion, but nothing is explained in the text nor legend. Could the authors provide more details in the results section as well as in the M&M ? This is essential to fully interpret the results obtained for this KO line, and to compare its phenotype to other lines (such as lines 184-9 Comparison of triple KO phenotype with that of Ago4 KO). Also, if it is a constitutive KO, the authors should mention if they observed other phenotypes in triple KO mice since AGO proteins are not only expressed in the male germline.

      Response: We apologize for omitting this vital information. We have now incorporated a more detailed description of how the Ago413 mutant was created in the results and M&M sections (line 120 and 686 respectively).

      As mentioned in the manuscript, Ago4, Ago1 and Ago3 are widely expressed in mammalian somatic tissues. Mutations or deletions of these genes does not disrupt development; however, there is limited research on the impact of these mutations in mammalian models in vivo. In humans, mutations in Ago1 and Ago3 genes are associated with neurological disorders, autism and intellectual disability (Tokita, M.J.,et al. 2015- doi: 10.1038/ejhg.2014.202., Sakaguchi et al. 2019- doi: 10.1016/j.ejmg.2018.09.004, Schalk et al 2021- doi: 10.1136/jmedgenet-2021-107751). In mouse, global deletion of Ago1 and Ago3 simultaneously was shown to increase mice susceptibility to influenza virus through impaired inflammation responses (Van Stry et al 2012- doi.org/10.1128/jvi.05303-11). Studies performed in female Ago413 mutants (the same mutant line used herein) have shown that knockout mice present postnatal growth retardation with elevated circulating leukocytes (Guidi et al 2023- doi: 10.1016/j.celrep.2023.113515). Other studies of double conditional knockout of Ago1 and Ago3 in the skin associated the loss of these Argonautes with decreased weight of the offspring and severe skin morphogenesis defects (Wang et al 2012- doi: 10.1101/gad.182758.111). In our study, we did not observe major somatic or overt behavioral phenotypes, and we did not observe statistical differences in body weights of null males compared to WT as shown in figure below.

      2= The paragraph corresponding to G2/M analysis is unclear to me. Why was this analysis performed? What does the heatmap show in Figure S4? What is G2/M score? (Fig 2D). Lines 219-220, do the authors mean that Pachytene cells are in a cell phase equivalent to G2/M? All this paragraph and associated figures require more explanation to clarify the method and interpretation.

      __Response: __We have modified the methods to include more information about how the cell cycle scoring used in Figures 2D and S4 were calculated and will add more information regarding the interpretation of these figures.

      3= I have concerns regarding Fig2G: to be convincing the analysis needs to be performed on several replicates, and, it is essential to compare tubules of the same stage - which does not seem to be the case. This does not appear to be the case. Besides, co (immunofluorescent) staining with markers of different cell types should be shown to demonstrate the earlier expression of some markers and their colocalization with markers of the earlier stages.

      __Response: __We agree with the Reviewer. New images with staged tubules will be added to the analysis of Figure 2G.

      4= one important question that I think the authors should discuss regarding their scRNAseq: clusters are defined using well characterized markers. But Ago triple KO appears to alter the timing of expression of genes... could this deregulation affects the interperetation of scRNAseq clusters and results?

      __Response: __We thank the reviewer for this suggestion and agree that including this information is important. We expect that, at most, this dysregulation impacts the edges of these clusters slightly. Given that marker genes that have been used to define cell types in these data are consistently expressed between the knockout and wildtype mice (see Figure S4A), we do not think that the cells in these clusters have different identities, just dysregulated expression programs. We have added the relevant sentence to the discussion, and will include additional supplemental figure panels to document this point more comprehensively.

      5= XY gene deregulation is mentioned throughout the result section but only X chromosome genes seem to have been investigated.... Even the gene content of the Y is highly repetitive, it would be very interesting to show the level of expression of Y single copy and Y multicopy genes in a figure 3 panel.

      __Response: __We agree with the reviewer that including analysis of Y-linked genes is important. We will add a supplemental figure which includes the Y:Autosome ratio and differential expression analysis.

      6= Can the authors elaborate on the observation that X gene upregulation is visible in the KO before MSCI; that is in lept/zygotene clusters (and in spermatogonia, if the difference visible in 3A is significant?)

      Response: We do see that X gene expression is upregulated before pachynema. Previous scRNA-seq studies that have looked at MCSI have seen that silencing of genes on the X and Y chromosomes starts before the cell clusters that are defined as pachynema, though silencing is not fully completed until pachynema. We have clarified this point in the manuscript.

      7 = miRNA analysis: could the authors indicate if X encoded miRNA were identified and found deregulated? Because Ago4 has been shown to lead to a downregulation of miRNA, among which many X encoded. It is therefore puzzling to see that the triple KO does not recapitulate this observation. Were the analyses performed differently in the present study and in Ago4 KO study?

      __Response: __The analysis identifying downregulation of miRNA in the original Ago4 mutant analysis was conducted relative to total small RNA expression. Amongst those altered miRNA families in the Ago4 mutants, we demonstrated both upregulation and downregulation of miRNA. We agree that confirming a similar global downregulation of miRNA counts compared to other small RNAs is important. Therefore, in a revised manuscript, we will add this information to the miRNA analysis section, especially highlighting the X chromosome-associated miRNAs, as well as whether the ratios between other small RNA classes change.

      8 = The last results paragraph would also benefit from some additional information. It is not clear why the authors focused on enhancers and did not investigate promoters (or maybe they were but it's unclear). Which regions (size and location from TSS) were investigated for motif enrichment analyses? To what correspond the "transcriptional regulatory regions previously identified using dREG" mentioned in the M&M? I understand it's based on a previous article, but more info in the present manuscript would be useful.

      Response: We thank the reviewer for this suggestion. The regions that were used for motif enrichment will be included as a supplementary information in the fully revised manuscript. We have also clarified in the methods that these transcriptional regulatory regions were downloaded from GEO and obtained from previous ChRO-seq data (from GEO) analysis. These data are run through the dREG pipeline that identifies regions predicted to contain transcription start sites, which include promoters and enhancers.

      Minor comments

      1) In the introduction: The sentence "Ago1 is not expressed in the germline from the spermatogonia stage onwards allowing us to use this model to study the roles of Ago4 and Ago3 in spermatogenesis." is misleading because Ago1 is expressed at least in spermatogonia; It would be more precise to write "after spermatogonia stage" and rephrase the sentence. Otherwise it is surprising to see AGO1 protein in testis lysate and it is not in line with the scRNA seq shown in figure 2.

      __Response: __We agree with the Reviewers suggestion and have edited the sentence on line 100. This sentence now reads "Ago1 is not expressed in the germline after the spermatogonia stage allowing us to use this model to study the roles of Ago4 and Ago3 in spermatogenesis".

      2) Could the authors precise if AGO proteins are expressed in other tissues? In somatic testicular cells?

      __Response: __Expression patterns of mammalian AGOs have been described in somatic and testicular tissues for the mouse by Gonzales-Gonzales et al (2008) by qPCR. They found that Ago2 is expressed in all the somatic tissues analyzed (brain, spleen, heart, muscle and lung) as well as the testis, with the highest expression in brain and lowest in heart. Ago1 is highly expressed in spleen compared to all the tissues analyzed, while Ago3 and Ago4 showed highest expression in testis and brain. Within somatic tissues of the testis, the four argonautes are expressed in Sertoli cells, however, Ago1,3 and 4 expression is very low compared to Ago2, with the latter showing a 10-fold higher transcript level. We have included a sentence with this information in the introduction in line 89.

      3) Pattern of expression: How do the authors explain that AGO3 disappears at the diplotene stage and reappears in spermatids?

      __Response: __ Single cell RNAseq data in the germline shows reduced transcript for Ago3 from the Pachytene stage onwards, suggesting minimal if any new transcription in round spermatids. We hypothesize that the AGO3 protein present in the round spermatid stage is cytoplasmic, presumably coming from the pool of AGO3 in the chromatoid body, a cytoplasmic structure with functional association with the nucleus in round spermatids (Kotaja et al, 2003 doi: 10.1073/pnas.05093331).

      4) It would be useful to show the timing of expression of AGO 1 to 4 throughout spermatogenesis in the first paragraph of the article. Maybe the authors could present data from fig2B earlier?

      Response: We understand the Reviewers concern, however, given that Ago expression throughout spermatogenesis was obtained from scRNA seq, we consider that this data should be presented after introducing the Ago413 knockout and the scRNA seq experiment. As Ago1-4 expression was also described in an earlier manuscript by Gonzales-Gonzales et al in the mouse male germline, and our data aligns with this report, we included a sentence about these previous findings in the earlier results section.

      5) Line 190: please modify the sentence "reveal no differences in cellular architecture of the seminiferous tubules when compared to wild-type males" to " reveal no gross differences..." since even without quantification of the different cell types it is visible that KO seminiferous tubules are different from WT tubules.

      __Response: __We agree with the reviewer, and we modified line 190 (now 173) as suggested. Grossly, seminiferous tubules from Ago413 null males contain the same cell types as in wild type tubules, including spermatozoa. However, our studies show that the number and quality of germ cells is compromised in knockouts, as shown by sperm counts and TUNEL staining.

      6) TUNEL analysis: please stage the tubules to determine the stage(s) at which apoptosis is the most predominant.

      __Response: __We have complied with the reviewer suggestion. Figure 1G now shows staged seminiferous tubules, and we have replaced the wild type image for one where the staged tubules match the knockout image.

      7) Figure S4B does not show an increase of cells at Pachytene stage but at Lepto/zygotene stage (as well as an increase of spermatogonia). Please comment this discrepancy with results shown in Fig2.

      __Response: __Figures 2 and S4 show distribution of cells in different substages of spermatogenesis and prophase I measured with very different methods: a cytological approach using chromosome spreads cells vs a transcriptomic approach that involves clustering of cells. We attribute the differences in cell type distribution to differences in the sensitivity of the methods to identify each cell type and therefore identify differences between the number of cells for each group. Moreover, our scRNA-seq data groups the leptotene and zygotene stages together, while the cytological approach allows for separation of these two sub-stages. Importantly, both results show that Ago413 spermatocytes are progressing slower from pachynema into diplonema and/or are dying after pachynema, as stated in line 194 in our manuscript.

      8) Fig5H and 5I are not mentioned in the result section. Also, it would be useful to label them with "all chromosomes" and "XY" to differentiate them easily

      __Response: __We apologize for the omission and have now cited Figures 5H and 5I in the manuscript (line 453). We have added the suggested labels.

      9) Line 530 "data provide further evidence for a functional association between AGO-dependent small RNAs and heterochromatin formation, maintenance and/or silencing." Please rephrase, the present article does not really show that AGO nuclear role depends on small RNAs.

      __Response____: __We agree with the reviewer that these data do not directly show a dependence on small RNAs. As our identified localization of AGO proteins to the pericentric heterochromatin coincides with localization of DICER shown previously by Yadav and collaborators (2020, doi: 10.1093/nar/gkaa460), we do believe that our data further implicates small RNAs in the silencing of heterochromatin. Yadav et al shows that DICER localizes to pericentromeric heterochromatin and processes major satellite transcripts into small RNAs in mouse spermatocytes, and cKO germ cells have reduced localization of SUV39H2 and H3K9me3 to the pericentromeric heterochromatin. Given the colocalization of both small RNA producing machinery and AGOs at pericentromeric heterochromatin, the AGOs may bind these small RNAs, and the statement in line 530 refers to how our results provide evidence for the involvement of other RNAi machinery in the silencing of pericentromeric heterochromatin investigated by Yadav et al which likely includes small RNAs.

      To clarify this point, we have modified the text accordingly.

      10) Line 1256: replace "cite here " by appropriate reference

      __Response: __The reference was added to line 1256.

      11) Please use SMARCA4 instead of BRG1 name as it is its official name.

      __Response: __We have replaced BRG1 with SMARCA4 in the text and figures.

      Figures:

      Figure 1: Are the pictures shown for Ago3-tagged and floxed from the same stages ? The leptotene stage in 1A looks like a zygotene, while some pachytene/diplotene stage pictures do not look alike.

      __Response: __New representative images have been added to figure 1 to match the same substages across the figure.

      Figure 1D, please label the Y scale properly (testis weight related to body weight)

      __Response: __We have fixed this.

      FigS1: Please comment the presence of non-specific bands in the figure legend

      __Response: __We have added a sentence in Figure S1 Legend.

      Fig 2E and F, please indicate on the figure (in addition to its legend), what are the X and Y axes respectively to facilitate its reading.

      __Response: __X and Y axes are now labelled in Figure 2E and F.

      2F: please use an easier abbreviation for Spermatocyte than Sp (which could spermatogonia, sperm etc..) such as Scyte I ? (same comment for Fig 3C)

      Response: The abbreviation for spermatocyte was changed from Sp to Scyte I in Figures 2 and 3.

      Overall, for all figures showing GSEA analyses, could the authors explain what a High positive NES and a High negative NES mean in the results section?

      Response: Thank you for this suggestion. We have added this information where the GSEA score of the cell markers is initially introduced.

      Significance

      Ago proteins are known for their roles in post transcriptional gene regulation via small RNA mediated cleavage of mRNA, which takes places in the cytoplasm. Some Ago proteins have been shown to be also located in the nucleus suggesting other non-canonical roles. It is the case of Ago4 which has been shown to localize to the transcriptionally silenced sex chromosomes (called sex body) of the spermatocyte nucleus, where it contributes to regulate their silencing (Modzelewski et al 2012). Interestingly, Ago4 knockout leads to Ago3 upregulation, including on the sex body indicating that Ago3 and Ago4 are involved in the same nuclear process. In their manuscript, de las Mercedes Carro et al., investigate the consequences of loss of both Ago3 and Ago4 in the male germline by the production of a triple knockout of Ago1, 3 and 4 in the mouse. With this model, the authors describe the role of Ago3 and Ago4 during spermatogenesis and show that they are involved in sex chromosome gene repression in spermatocytes and in round spermatids, as well as in the control of autosomal meiotic gene expression. Triple KO males have impaired meiosis and spermiogenesis, with fewer and abnormal spermatozoa resulting in reduced fertility. Since Ago1 male germline expression is restricted to pre-meiotic germ cells, it is not expected to contribute to the meiotic and postmeiotic phenotypes observed in the triple KO. The strengths of the study are i) the thorough analyses of mRNA expression at the single cell level, and in purified spermatocytes and spermatids (bulk RNAseq), ii) the identification of novel nuclear partners of AGO3/4 relevant for their described nuclear role: ATF2, which they show to also co-localize with the sex body, and BRG1/SMARCA4, a SWI/SNF chromatin remodeler. The main limitation of the study is the lack of information in the method regarding the production of the triple KO, as well as some aspects of the transcriptome and motif analyses. It is also surprising to see that the triple KO does not recapitulate the miRNA deregulation observed in Ago4 KO. The characterization of a non-canonical role of AGO3/4 in male germ cells will certainly influence researchers of the field, and also interest a broader audience studying Argonaute proteins and gene regulation at transcriptional and posttranscriptional levels.

      Reviewer #2

      Evidence, reproducibility and clarity

      In the manuscript titled "Argonaute proteins regulate the timing of the spermatogenic transcriptional program" by Carro et al., the authors present their findings on how Argonaute proteins regulate spermatogenic development. They utilize a mouse model featuring a deletion of the gene cluster on chromosome 4 that contains Ago1, Ago3, and Ago4 to investigate the cumulative roles of AGO3 and AGO4 in spermatogenic cells. The authors characterize the distribution of AGO proteins and their effects on key meiotic milestones such as synapsis, recombination, meiotic transcriptional regulation, and meiotic sex chromosome inactivation (MSCI). They analyze stage-specific transcriptomes in spermatogenic cells using single-cell and bulk RNA sequencing and determine the interactome of AGO3 and AGO4 through mass spectrometry to examine how AGO proteins may regulate gene expression in these cells during meiotic and post-meiotic development. The authors conclude that both AGO3 and AGO4 are essential for regulating the overall gene expression program in spermatogenic cells and specifically modulate MSCI to repress sex-linked genes in pachytene spermatocytes, which may be partially mediated by the proper distribution of DNA damage repair factors. Additionally, AGO3 is suggested to interact with the chromatin remodeler SWI/SNF factor BRG1, facilitating its removal from the sex-chromatin to enable the repression of sex-linked genes during MSCI.

      Major Comments: 1. The study utilized a triple knockout mouse model to determine the effect of AGO3 on spermatogenesis, following up on their previous report about the role of AGO4 in spermatogenesis, which resulted from an upregulation of AGO3 in Ago4-/- spermatocytes. However, the results are more difficult to interpret and ascertain the role of AGO3 in these cells, given the absence of any observable phenotype from Ago3 interruption. AGO4 regulates sex body formation, meiotic sex chromosome inactivation (MSCI), and miRNA production in spermatocytes, all of which were noted in the absence of both AGO3 and AGO4, with only an increased incidence of cells containing abnormal RNAPII at the sex chromosomes. It will be necessary to characterize how AGO3 regulates spermatogenic development, including meiotic progression and the regulation of the meiotic transcriptome, and compare these findings with the current observations to determine if the proposed mechanism involving AGO3, BRG1, and possibly AP2 is relevant in this context.

      __Response: __While we agree with Reviewer that a single Ago3 knockout will help understand distinct roles of AGO3 and AGO4 in spermatogenesis, the time and resources required to generate a new mouse model are substantial. The analysis included in this current manuscript has already taken over seven years, and with the lengthy production of a new single mutant mouse, validation of the new mouse, and then final analysis, we would be looking at another 3-5 years of analysis. In the current funding climate, and with strong concerns over ensuring reduction in utilization of laboratory mice, we consider this request to be far in excess of what is required to move this important story forward.

      The Ago413-/- mouse model has allowed us to associate a nuclear role of Argonaute proteins with a strong reproductive phenotype in the mouse germline. Given the redundancy between Ago3 and Ago4, it is likely that a single Ago3 knockout would have a mild phenotype just like the Ago4 KO. All this said, we agree with the reviewer that analysis of an Ago3 knockout mouse is a valuable next step, just not within this chapter of the story.

      1. Does Ago413-/- mice recapitulate the early meiotic entry phenotype observed in Ago4-/- mice? If not, could it be possible that AGO3 promotes meiotic entry, given its strong mRNA expression in spermatogonia according to the scRNAseq data (Fig. 2B)

      Response: Our scRNA-seq data shows strong expression of Ago3 in spermatogonia, as mentioned by the Reviewer. Analysis of cell cycle marker expression also shows that the transcriptomic profile of spermatogonia is altered, with higher levels of transcripts corresponding to the later G2/M stages (Figure 2D). Moreover, Ago413 knockouts present an increase in the number of spermatogonial stem cells (Supplementary Figure S4B). However, this cluster represents a pool of quiescent and mitotically active cells entering meiosis, therefore interpretation of these data might be challenging. While specific experiments could be conducted to answer this question, this is outside of the scope of our manuscript. The manuscript as it stands is already rather large, and a full analysis of meiotic entry dynamics would dilute the core message relating to chromatin regulation in the sex body.

      1. The authors suggested that the removal of BRG1 by AGO3 is necessary during sex body formation and the eventual establishment of MSCI. However, the BAF complex subunit ARID1A has been shown to facilitate MSCI by regulating promoter accessibility. It will be interesting to determine how BRG1 distribution changes across the genome in the absence of AGO proteins and how that correlates with alterations in sex-linked gene expression.

      __Response: __We agree that changes in BRG1 distribution across the genome would be very interesting to identify. However, in this work we show that BRG1/SMARCA4 protein changes its localization in the sex body very rapidly between early to late pachynema. These two substages are only discernable by immunofluorescence using synaptonemal complex markers, as there are currently no available techniques to enrich for these subfractions. Therefore, study of genome occupancy of BRG1 in these specific substages by techniques such as CUT&Tag are not currently possible. However, we are currently working on new methods to distinguish these cell populations and hope eventually to use these purification strategies to perform the studies suggested by this reviewer. Alternatively, the hope is that single cell CUT&Tag methods will become more reliable, and will enable us to address these questions. Both of these options are not currently available to us. The studies by Menon et al (2024-doi:10.7554/eLife.88024.5) provide strong evidence to support that ARID1A is needed to reduce promoter accessibility of XY silenced genes in prophase I through modulation of H3.3 distribution. However, this mechanism and our identification of the removal of BRG1 between early and late pachytema are not inconsistent with one another, as either SMARCA4 or SMARCA2 can associate with ARID1A as part of the cBAF complex, and ARID1A is also not in all forms of the BAF complex which BRG1 are in. The difference between our results and those seen in Menon et al likely indicate that there are multiple forms of the BAF complex which are differentially regulated during MSCI and play different roles in silencing transcription. Further studies of specific BAF subunits are needed to elucidate how different flavors of the BAF complex act at specific genomic locations and meiotic time points.

      1. The observations presented in this manuscript (Fig. 1D, 2C, 3D, and 4) suggest a haploinsufficiency of the deleted locus in spermatogenic development. How does this compare with the ablation of either Ago3 or Ago4? Please explain.

      Response: Our previous studies in single Ago4 knockouts did not present a heterozygous phenotype (Modzelewski et al 2012, doi: 10.1016/j.devcel.2012.07.003, data not shown). Triple Ago413 knockouts show a much stronger fertility phenotype than single Ago4 knockout. Testis weight of Ago413 homozygous null present a 30% reduction while heterozygous mice show a 15% reduction (Figure 1D), comparable to the 13% reduction previously observed in Ago4-/- males. Sperm counts of Ago413 null and heterozygous males are reduced by 60% and 39% compared to wild type (Figure 1E), respectively, whereas Ago4 null mice have a milder phenotype, with only a 22% reduction in sperm counts. At the MSCI level, both homozygous and heterozygous Ago413 mutant spermatocytes show a similar increase in pachytene spermatocytes with increased RNA pol II ingression into the sex body with respect to wild-type of 35% and 30%, respectively. Ago4 single knockouts show an almost 18% increase in Pol II ingression when compared to wild type. These comparisons are now included in our manuscript in lines 170, 172 and 288. A milder phenotype of the Ago4 knockout and haploinsufficiency in triple Ago413 knockouts but not in Ago4 single knockouts is likely a consequence of the overlapping functions of Ago3 and Ago4 in mammals (and/or overexpression of Ago3 in Ago4 knockouts). In the context of their role in RISC, Wang et al (doi: 10.1101/gad.182758.111) studied the effects of single and double conditional knockouts for Ago1 and Ago2 in miRNA-mediated silencing. They discovered that the interaction between miRNAs and AGOs is highly correlated with the abundance of each AGO protein, and only double knockouts presented an observable phenotype.

      Minor Comments: Based on the interactome analysis, it was argued that AGO3 and AGO4 may function separately. Please discuss how AGO3 might compensate for AGO4 (Line 109).

      Response: We hypothesize that the combined function of AGO3 and AGO4 is needed for proper sex chromosome inactivation during meiosis. We base this hypothesis on the facts that (i) both proteins localize to the sex body in pachytene spermatocytes, (ii) loss of Ago4 leads to upregulation of Ago3, and (iii) the MSCI phenotype of Ago413 knockout mice is much stronger than the single Ago4 knockout (see above). However, AGO3 and AGO4 might not induce silencing through the same mechanism or pathway. In this work, we observed that their temporal expression in prophase I is different; while AGO3 protein seems to disappear by the diplotene stage, AGO4 is present in the sex body of these cells. Moreover, the proteomic analysis revealed a very low number of common interactors, an observation which could support the idea of AGO3 and AGO4 acting by different (albeit perhaps related) mechanisms to achieve MSCI. It is also possible that common interactors were not identified in our proteomic analysis due to the low abundance of AGO3 and AGO4 in the germ cells, limiting the resolution of the proteomics analysis (note that in order to visualize AGO proteins in WB experiments, at least 60 μg of enriched germ cell lysate must be loaded per lane). Moreover, given the difficulty in obtaining enough isolated pachytene and diplotene spermatocytes to perform immunoprecipitation experiments, we performed IP experiments in whole germ cell lysates, which limits the interpretation of our analysis. If AGO3 and AGO4 protein interactors overlap, then AGO3 would directly substitute for AGO4 leading to silencing in single Ago4 knockouts. However, if AGO3 and AGO4 work together through different, complementary mechanisms, then Ago4 mutant mice likely compensates loss of Ago4 by upregulation of Ago3along with specific interactors of the given pathway. We have added a sentence addressing this matter in line 411 of the results section and lines 506 and 513 of the discussion in the revised manuscript.

      In Line 221, it is unclear what is meant by 'cell cycle transcripts'. Does this refer to meiotic transcripts? It is also important to discuss the relevance of the G2/M cell cycle marker genes at later stages of meiotic prophase.

      Response: Thank you for this suggestion. We have changed the relevant text to remove redundancies and include more information. We agree that considering the importance of these genes across meiotic prophase is needed, as cells which are in the dividing stage will already have produced the proteins necessary for division. These cells likely correspond to the diplotene/M cluster cells that have a lower G2/M score, potentially causing the bimodal distribution seen in Figure 2D. We have added a sentence addressing this to the manuscript.

      While identified as a common interactor of both AGO3 and AGO4 in lines 440-445, HNRNPD is not listed among AGO4 interactors in Table S6. Please correct or explain this discrepancy.

      Response: HNRPD was originally identified as an AGO4 interactor using a less strict criteria than the one used in our manuscript: we required consistent enrichment in at least two rounds of IP MS experiments. This reference to HNRNPD was a mistake, given that HNRPD was only enriched in one of our three replicates. Thus, we apologize and have removed the sentence in lines 440-445.

      It is unclear whether wild-type cell lysate or lysate containing FLAG-tagged AGO3 was used for BRG1 immunoprecipitation, and which antibody was used to detect AGO3 in the BRG1 IP sample. A co-IP experiment demonstrating interaction between BRG1 and wild-type AGO3 would be ideal in this context. Furthermore, co-localization by IF would be beneficial to determine the subcellular localization and the cell stages the interaction may be occurring. Additionally, co-IP and Western blot methodologies should be included in the methods section.

      __Response: __MYC-FLAG tagged AGO3 protein lysates were used for BRG1 Co-Immunoprecipitation, along with an anti MYC antibody to detect AGO3. This is now detailed in the Methods section of our revised manuscript (line 1133).

      Regarding BRG1 and AGO3 colocalization by IF, we can confidently show that both AGO3 and BRG1 localize to the sex chromosomes in early pachynema by comparing BRG1/SYCP3 and FLAG-AGO3/SYCP3 stained spreads. We were not able to show colocalization simultaneously on the same cells, given the lack of appropriate antibodies. Our anti FLAG antibody is raised in mouse, while anti BRG1 is raised in rabbit, therefore a non-rabbit, non-mouse anti SYCP3 would be needed to identify prophase I substages, and our lab does not possess such a validated antibody. However, we now have access to a multiplexing kit that allows to use same-species antibodies for immunofluorescence and we can perform these experiments for a revised manuscript.

      __Response: __The methods section now includes description of co-IP methodologies (line 1132). Western Blot methodologies are explained in lane 718, under the "Immunoblotting" title.

      In line 599, it is unclear what is meant by 'persistence of sex chromosome de-repression'. Please correct or clarify this.

      Response: This sentence has been changed and reads: "The persistence of sex chromosome gene expression".

      If possible, please add an illustration to summarize the findings together.

      Response: We thank the reviewer for this suggestion, and have now added this in Figure 6

      Significance

      Overall, this study enhances the understanding of gene expression regulation by AGO proteins during spermatogenesis. Several approaches, including functional, histological, and molecular characterization of the triple knockout phenotype, were instrumental in elucidating the role of AGO proteins in MSCI and meiotic as well as postmeiotic gene regulation. The main limitation of the study is that it is challenging to appreciate the role of AGO3 in addition to the previously published role of AGO4 without the inclusion of necessary control groups. Furthermore, the mechanism of action for AGO proteins in meiotic gene regulation was left relatively unexplored. This study presents new findings that will be significant for the research community interested in gene regulation, chromatin biology, and reproductive biology with the above suggestions considered.

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)): __

      The authors characterize a CRISPR-Cas9 mouse mutant that targets 3 genes that encode AGO family proteins, 2 of which are expressed during spermatogenesis (AGO3 and AGO4) and one that is said is not expressed, AGO1. This mouse mutant showed that AGO3 and AGO4 both contribute to spermatogenesis success as the "Ago413" mutation gave rise to an additive reduction in testis weight, due to spermatocyte apoptosis, and reduction in sperm count. Furthermore, they use insertion mouse mutants for Ago3 and Ago2 that express tagged versions of their corresponding proteins, which they use in combination with pan-AGO antibodies and Ago mutants to show differential expression and localization properties of AGO2, AGO3, and AGO4 (and the absence of AGO1) during spermatogenesis with a particular focus on meiotic prophase. They perform single-cell RNAseq and intricate analyses to demonstrate a change in distribution of meiotic stages in Ago413 mutants, and the overall cell cycle in spermatogonia and spermatocytes is altered. This analysis shows that the mutation leads to an inability to downregulate prior spermatogonia/spermatocyte stage transcripts in a timely manner. On the other hand, later-stage spermatocytes are abnormally expressing spermiogenesis genes. Similar to the Ago4 mutant previously characterized MSCI is disrupted. The authors also show that AGO3 has different interaction partners compared to AGO4 and focus their final assessment on a novel interaction partner of AGO3, BRG1. They show that this factor, which is involved in chromatin remodeling, is aberrantly localized to the sex body during meiotic prophase and diplonema. As BRG1 is involved in open chromatin, it is proposed that AGO3 restricts BRG1 (and related proteins) from the XY chromosome to ensure MSCI. Overall, this paper is very well constructed with mechanistic insights that make this a very impactful contribution to the research community. Major Comments:

      1. The abstract contains "Ago413-/- mouse" without any explanation of what that is. The abstract needs to be a stand-alone document that does not require any referencing for context.

      Response: We have included a sentence describing Ago413 in line 27

      Figure 2C. - The significance bars are confusing as they appear to overlap strangely.

      Response: We have modified this figure and now present the significance bars are on top of the data points.

      On line 235, the authors state that "we first identified the top non-overlapping upregulated genes for Ago413+/+ germ cells in each cluster. Why did the authors not also select down-regulated genes in each cluster to perform a similar analysis?

      __Response: __Thank you for this question. As our goal was to identify genes that are markers of the transcriptional program in each cell type, we used only uniquely upregulated genes for each cluster. Genes that are downregulated for a cluster may be indicative of the transcription in several other cell types, which is not easily interpretable. For a revised manuscript, we will perform this analysis to determine if there is any specific alterations in these downregulated genes.

      Their Ago413 mutant characterization does a good job of assessing meiotic prophase and spermatozoa. However, their assessment of the stages in between these is lacking (meiotic divisions and spermiogenesis).

      Response: We understand the reviewer's concern, however, it is not usual to study stages between the first meiotic division and spermiogenesis because meiosis II is so rapid and thus we lack tools to dissect it. In general, any defect that impacts meiosis I (and particularly prophase I) leads to cell death during prophase I or at metaphase I due to strictly adhered checkpoints that eradicate defective cells. Thus, the increased TUNEL staining in prophase I indicates to us that defective cells are cleared before exit from meiosis I, and those cells progressing to the spermatid stage are "normal" for meiosis II progression. For these cells that did complete meiosis I and progressed normally through meiosis II, we analyzed their spermiogenic outcome extensively (see section entitled "Post-meiotic spermatids from Ago413-/- males exhibit defective spermiogenesis and poor spermatozoa function"). This section included extensive sperm morphology, sperm motility and sperm fertility through in vitro fertilization assays. That said, we have added a sentence on line 268 to explain the transit through meiosis II.

      The discovery of the interaction between BRG1 and AGO3 is exciting. They should assess BRG1 localization in later sub-stages, including late diplonema and diakinesis.

      __Response: __BRG1(SMARCA4) was analyzed throughout prophase I, as shown in image 5G, including quantification of fluorescence intensity included the analysis of diplonema (5H-I). However, diakinesis was not included here since there was no observable signal of BRG1 in these cells. We have explained this in lines 459.

      ATF2 should have been assessed in more detail, as was done for BRG1 in Figure 5.

      __Response: __We agree with the Reviewer, however, staining of chromosome spreads with the anti ATF2 antibody was not possible in our hands after several attempts and changes in staining conditions. However, as staining of sections was successful, we showed localization of ATF2 on spermatocytes by co staining sections with SYCP3 and ATF2.

      Reviewer #3 (Significance (Required)): Overall, this paper is very well constructed with mechanistic insights, as described in my reviewer comments, that make this a very impactful contribution to the research community.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In their manuscript de las Mercedes Carro et al investigated the role of Ago proteins during spermatogenesis by producing a triple knockout of Ago 1, 3 and 4. They first describe the pattern of expression of each protein and of Ago2 during the differentiation of male germ cells, then they describe the spermatogenesis phenotype of triple knockout males, study gene deregulation by scRNA seq and identify novel interacting proteins by co-IP mass spectrometry, in particular BRG1/SMARCA4, a chromatin remodeling factor and ATF2 a transcription factor. The main message is that Ago3 and 4 are involved in the regulation of XY gene silencing during meiosis, and also in the control of autosomal gene expression during meiosis. Overall the manuscript is well written, the topic, very interesting and the experiments, well-executed. However, there are some parts of the methodology and data interpretation that are unclear (see below).

      Major comments

      1. Please clarify how the triple KO was obtained, and if it is constitutive or specific to the male germline. In the result section a Cre (which cre?) is mentioned but it is not mentioned in the M&M. On Figure S1, a MICER VECTOR is shown instead of a deletion, but nothing is explained in the text nor legend. Could the authors provide more details in the results section as well as in the M&M ? This is essential to fully interpret the results obtained for this KO line, and to compare its phenotype to other lines (such as lines 184-9 Comparison of triple KO phenotype with that of Ago4 KO). Also, if it is a constitutive KO, the authors should mention if they observed other phenotypes in triple KO mice since AGO proteins are not only expressed in the male germline.
      2. The paragraph corresponding to G2/M analysis is unclear to me. Why was this analysis performed? What does the heatmap show in Figure S4? What is G2/M score? (Fig 2D). Lines 219-220, do the authors mean that Pachytene cells are in a cell phase equivalent to G2/M? All this paragraph and associated figures require more explanation to clarify the method and interpretation.
      3. I have concerns regarding Fig2G: to be convincing the analysis needs to be performed on several replicates, and, it is essential to compare tubules of the same stage - which does not seem to be the case. This does not appear to be the case. Besides, co (immunofluorescent) staining with markers of different cell types should be shown to demonstrate the earlier expression of some markers and their colocalization with markers of the earlier stages.
      4. one important question that I think the authors should discuss regarding their scRNAseq: clusters are defined using well characterized markers. But Ago triple KO appears to alter the timing of expression of genes... could this deregulation affects the interperetation of scRNAseq clusters and results?
      5. XY gene deregulation is mentioned throughout the result section but only X chromosome genes seem to have been investigated.... Even the gene content of the Y is highly repetitive, it would be very interesting to show the level of expression of Y single copy and Y multicopy genes in a figure 3 panel.
      6. Can the authors elaborate on the observation that X gene upregulation is visible in the KO before MSCI; that is in lept/zygotene clusters (and in spermatogonia, if the difference visible in 3A is significant?)
      7. miRNA analysis: could the authors indicate if X encoded miRNA were identified and found deregulated? Because Ago4 has been shown to lead to a downregulation of miRNA, among which many X encoded. It is therefore puzzling to see that the triple KO does not recapitulate this observation. Were the analyses performed differently in the present study and in Ago4 KO study?
      8. The last results paragraph would also benefit from some additional information. It is not clear why the authors focused on enhancers and did not investigate promoters (or maybe they were but it's unclear). Which regions (size and location from TSS) were investigated for motif enrichment analyses? To what correspond the "transcriptional regulatory regions previously identified using dREG" mentioned in the M&M? I understand it's based on a previous article, but more info in the present manuscript would be useful.

      Minor comments

      1. In the introduction: The sentence "Ago1 is not expressed in the germline from the spermatogonia stage onwards allowing us to use this model to study the roles of Ago4 and Ago3 in spermatogenesis." is misleading because Ago1 is expressed at least in spermatogonia; It would be more precise to write "after spermatogonia stage" and rephrase the sentence. Otherwise it is surprising to see AGO1 protein in testis lysate and it is not in line with the scRNA seq shown in figure 2.
      2. Could the authors precise if AGO proteins are expressed in other tissues? In somatic testicular cells?
      3. Pattern of expression: How do the authors explain that AGO3 disappears at the diplotene stage and reappears in spermatids?
      4. It would be useful to show the timing of expression of AGO 1 to 4 throughout spermatogenesis in the first paragraph of the article. Maybe the authors could present data from fig2B earlier?
      5. Line 190: please modify the sentence "reveal no differences in cellular architecture of the seminiferous tubules when compared to wild-type males" to " reveal no gross differences..." since even without quantification of the different cell types it is visible that KO seminiferous tubules are different from WT tubules.
      6. TUNEL analysis: please stage the tubules to determine the stage(s) at which apoptosis is the most predominant.
      7. Figure S4B does not show an increase of cells at Pachytene stage but at Lepto/zygotene stage (as well as an increase of spermatogonia). Please comment this discrepancy with results shown in Fig2.
      8. Fig5H and 5I are not mentioned in the result section. Also, it would be useful to label them with "all chromosomes" and "XY" to differentiate them easily
      9. Line 530 "data provide further evidence for a functional association between AGO-dependent small RNAs and heterochromatin formation, maintenance and/or silencing." Please rephrase, the present article does not really show that AGO nuclear role depends on small RNAs.
      10. Line 1256: replace "cite here " by appropriate reference
      11. Please use SMARCA4 instead of BRG1 name as it is its official name.

      Figures:

      Figure 1: Are the pictures shown for Ago3-tagged and floxed from the same stages ? The leptotene stage in 1A looks like a zygotene, while some pachytene/diplotene stage pictures do not look alike.

      Figure 1D, please label the Y scale properly (testis weight related to body weight)

      FigS1: Please comment the presence of non-specific bands in the figure legend

      Fig 2E and F, please indicate on the figure (in addition to its legend), what are the X and Y axes respectively to facilitate its reading.

      2F: please use an easier abbreviation for Spermatocyte than Sp (which could spermatogonia, sperm etc..) such as Scyte I ? (same comment for Fig 3C)

      Overall, for all figures showing GSEA analyses, could the authors explain what a High positive NES and a High negative NES mean in the results section?

      Significance

      Ago proteins are known for their roles in post transcriptional gene regulation via small RNA mediated cleavage of mRNA, which takes places in the cytoplasm. Some Ago proteins have been shown to be also located in the nucleus suggesting other non-canonical roles. It is the case of Ago4 which has been shown to localize to the transcriptionally silenced sex chromosomes (called sex body) of the spermatocyte nucleus, where it contributes to regulate their silencing (Modzelewski et al 2012). Interestingly, Ago4 knockout leads to Ago3 upregulation, including on the sex body indicating that Ago3 and Ago4 are involved in the same nuclear process. In their manuscript, de las Mercedes Carro et al., investigate the consequences of loss of both Ago3 and Ago4 in the male germline by the production of a triple knockout of Ago1, 3 and 4 in the mouse. With this model, the authors describe the role of Ago3 and Ago4 during spermatogenesis and show that they are involved in sex chromosome gene repression in spermatocytes and in round spermatids, as well as in the control of autosomal meiotic gene expression. Triple KO males have impaired meiosis and spermiogenesis, with fewer and abnormal spermatozoa resulting in reduced fertility. Since Ago1 male germline expression is restricted to pre-meiotic germ cells, it is not expected to contribute to the meiotic and postmeiotic phenotypes observed in the triple KO. The strengths of the study are i) the thorough analyses of mRNA expression at the single cell level, and in purified spermatocytes and spermatids (bulk RNAseq), ii) the identification of novel nuclear partners of AGO3/4 relevant for their described nuclear role: ATF2, which they show to also co-localize with the sex body, and BRG1/SMARCA4, a SWI/SNF chromatin remodeler. The main limitation of the study is the lack of information in the method regarding the production of the triple KO, as well as some aspects of the transcriptome and motif analyses. It is also surprising to see that the triple KO does not recapitulate the miRNA deregulation observed in Ago4 KO. The characterization of a non-canonical role of AGO3/4 in male germ cells will certainly influence researchers of the field, and also interest a broader audience studying Argonaute proteins and gene regulation at transcriptional and posttranscriptional levels.

    1. En condiciones patológicas

      en condiciones patológicas por el engrosamiento del intersticio alveolar que esta en la pared del alveolo donde esta las células epiteliales alveolar y las células alveolares dará una acumulación de este intersticio de células que se mencionan además de eso por la localización también afectara las células epiteliales alveolares ->perdida de capas y hipertrofia c. endoteliales pulmón->perdida remodelación->afectando los capilares pulmonares ->función pulmonar afectada

    2. El estrechamiento de la vía aérea que conduce a un tiempo espiratorio prolongado produce hiperinflación.

      osea la infiltracion de las celulas inmunes hace que el lumen bronquial se redusca para que estas glandulas submucosas tengan una hipertrofia hiperplasia y probocar una obstruccion de flujo de aire haciendo que aumente el volumen reisudal y provocar una hiperininflacion y aumente el v/q en algunas zonas por el atrapamiento de aire

    1. Reviewer #1 (Public review):

      Summary:

      The authors state the study's goal clearly: "The goal of our study was to understand to what extent animal individuality is influenced by situational changes in the environment, i.e., how much of an animal's individuality remains after one or more environmental features change." They use visually guided behavioral features to examine the extent of correlation over time and in a variety of contexts. They develop new behavioral instrumentation and software to measure behavior in Buridan's paradigm (and variations thereof), the Y-maze, and a flight simulator. Using these assays, they examine the correlations between conditions for a panel of locomotion parameters. They propose that inter-assay correlations will determine the persistence of locomotion individuality.

      Strengths:

      The OED defines individuality as "the sum of the attributes which distinguish a person or thing from others of the same kind," a definition mirrored by other dictionaries and the scientific literature on the topic. The concept of behavioral individuality can be characterized as: (1) a large set of behavioral attributes, (2) with inter-individual variability, that are (3) stable over time. A previous study examined walking parameters in Buridan's paradigm, finding that several parameters were variable between individuals, and that these showed stability over separate days and up to 4 weeks (DOI: 10.1126/science.aaw718). The present study replicates some of those findings and extends the experiments from temporal stability to examining correlation of locomotion features between different contexts.

      The major strength of the study is using a range of different behavioral assays to examine the correlations of several different behavior parameters. It shows clearly that the inter-individual variability of some parameters is at least partially preserved between some contexts, and not preserved between others. The development of high-throughput behavior assays and sharing the information on how to make the assays is a commendable contribution.

      Weaknesses:

      The definition of individuality considers a comprehensive or large set of attributes, but the authors consider only a handful. In Supplemental Fig. S8, the authors show a large correlation matrix of many behavioral parameters, but these are illegible and are only mentioned briefly in Results. Why were five or so parameters selected from the full set? How were these selected? Do the correlation trends hold true across all parameters? For assays in which only a subset of parameters can be directly compared, were all of these included in the analysis, or only a subset?

      The correlation analysis is used to establish stability between assays. For temporal re-testing, "stability" is certainly the appropriate word, but between contexts it implies that there could be 'instability'. Rather, instead of the 'instability' of a single brain process, a different behavior in a different context could arise from engaging largely (or entirely?) distinct context-dependent internal processes, and have nothing to do with process stability per se. For inter-context similarities, perhaps a better word would be "consistency".

      The parameters are considered one-by-one, not in aggregate. This focuses on the stability/consistency of the variability of a single parameter at a time, rather than holistic individuality. It would appear that an appropriate measure of individuality stability (or individuality consistency) that accounts for the high-dimensional nature of individuality would somehow summarize correlations across all parameters. Why was a multivariate approach (e.g. multiple regression/correlation) not used? Treating the data with a multivariate or averaged approach would allow the authors to directly address 'individuality stability', along with the analyses of single-parameter variability stability.

      The correlation coefficients are sometimes quite low, though highly significant, and are deemed to indicate stability. For example, in Figure 4C top left, the % of time walked at 23{degree sign}C and 32{degree sign}C are correlated by 0.263, which corresponds to an R2 of 0.069 i.e. just 7% of the 32{degree sign}C variance is predictable by the 23{degree sign}C variance. Is it fair to say that 7% determination indicates parameter stability? Another example: "Vector strength was the most correlated attention parameter... correlations ranged... to -0.197," which implies that 96% (1 - R2) of Y-maze variance is not predicted by Buridan variance. At what level does an r value not represent stability?

      The authors describe a dissociation between inter-group differences and inter-individual variation stability, i.e. sometimes large mean differences between contexts, but significant correlation between individual test and retest data. Given that correlation is sensitive to slope, this might be expected to underestimate the variability stability (or consistency). Is there a way to adjust for the group differences before examining correlation? For example, would it be possible to transform the values to in-group ranks prior to correlation analysis?

      What is gained by classifying the five parameters into exploration, attention, and anxiety? To what extent have these classifications been validated, both in general, and with regard to these specific parameters? Is increased walking speed at higher temperature necessarily due to increased 'explorative' nature, or could it be attributed to increased metabolism, dehydration stress, or a heat-pain response? To what extent are these categories subjective?

      The legends are quite brief and do not link to descriptions of specific experiments. For example, Figure 4a depicts a graphical overview of the procedure, but I could not find a detailed description of this experiment's protocol.

      Using the current single-correlation analysis approach, the aims would benefit from re-wording to appropriately address single-parameter variability stability/consistency (as distinct from holistic individuality). Alternatively, the analysis could be adjusted to address the multivariate nature of individuality, so that the claims and the analysis are in concordance with each other.

      The study presents a bounty of new technology to study visually guided behaviors. The Github link to the software was not available. To verify successful transfer or open-hardware and open-software, a report would demonstrate transfer by collaboration with one or more other laboratories, which the present manuscript does not appear to do. Nevertheless, making the technology available to readers is commendable.<br /> The study discusses a number of interesting, stimulating ideas about inter-individual variability and presents intriguing data that speaks to those ideas, albeit with the issues outlined above.

      While the current work does not present any mechanistic analysis of inter-individual variability, the implementation of high-throughput assays sets up the field to more systematically investigate fly visual behaviors, their variability, and their underlying mechanisms.

      Comments on revisions:

      I want to express my appreciation for the authors' responsiveness to the reviewer feedback. They appear to have addressed my previous concerns through various modifications including GLM analysis, however, some areas still require clarification for the benefit of an audience that includes geneticists.

      (1) GLM Analysis Explanation (Figure 9)<br /> While the authors state that their new GLM results support their original conclusions, the explanation of these results in the text is insufficient. Specifically:

      - The interpretation of coefficients and their statistical significance needs more detailed explanation. The audience includes geneticists and other non-statistical people, so the GLM should be explained in terms of the criteria or quantities used to assess how well the results conform with the hypothesis, and to what extent they diverge.<br /> - The criteria used to judge how well the GLM results support their hypothesis are not clearly stated.<br /> - The relationship between the GLM findings and their original correlation-based conclusions needs better integration and connection, leading the reader through your reasoning.

      (2) Documentation of Changes<br /> One struggle with the revised manuscript is that no "tracked changes" version was included, so it is hard to know exactly what was done. Without access to the previous version of the manuscript, it is difficult to fully assess the extent of revisions made. The authors should provide a more comprehensive summary of the specific changes implemented, particularly regarding:

      (3) Statistical Method Selection<br /> The authors mention using "ridge regression to mitigate collinearity among predictors" but do not adequately justify this choice over other approaches. They should explain:

      - Why ridge regression was selected as the optimal method<br /> - How the regularization parameter (λ) was determined<br /> - How this choice affects the interpretation of environmental parameters' influence on individuality

    2. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors state the study's goal clearly: "The goal of our study was to understand to what extent animal individuality is influenced by situational changes in the environment, i.e., how much of an animal's individuality remains after one or more environmental features change." They use visually guided behavioral features to examine the extent of correlation over time and in a variety of contexts. They develop new behavioral instrumentation and software to measure behavior in Buridan's paradigm (and variations thereof), the Y-maze, and a flight simulator. Using these assays, they examine the correlations between conditions for a panel of locomotion parameters. They propose that inter-assay correlations will determine the persistence of locomotion individuality.

      Strengths: 

      The OED defines individuality as "the sum of the attributes which distinguish a person or thing from others of the same kind," a definition mirrored by other dictionaries and the scientific literature on the topic. The concept of behavioral individuality can be characterized as: 

      (1) a large set of behavioral attributes, 

      (2) with inter-individual variability, that are 

      (3) stable over time. 

      A previous study examined walking parameters in Buridan's paradigm, finding that several parameters were variable between individuals, and that these showed stability over separate days and up to 4 weeks (DOI: 10.1126/science.aaw718). The present study replicates some of those findings and extends the experiments from temporal stability to examining the correlation of locomotion features between different contexts.

      The major strength of the study is using a range of different behavioral assays to examine the correlations of several different behavior parameters. It shows clearly that the inter-individual variability of some parameters is at least partially preserved between some contexts, and not preserved between others. The development of highthroughput behavior assays and sharing the information on how to make the assays is a commendable contribution.

      We thank the reviewer for his exceptionally kind assessment of our work!

      Weaknesses: 

      The definition of individuality considers a comprehensive or large set of attributes, but the authors consider only a handful. In Supplemental Fig. S8, the authors show a large correlation matrix of many behavioral parameters, but these are illegible and are only mentioned briefly in Results. 

      We have now uploaded a high-resolution PDF to the Github Address: https://github.com/LinneweberLab/Mathejczyk_2024_eLife_Individuality/blob/main/S8.pdf, and this is also mentioned in the figure legend for Fig. S8

      Why were five or so parameters selected from the full set? How were these selected? 

      The five parameters (% of time walked, walking speed, vector strength, angular velocity, and centrophobicity) were selected because they describe key aspects of the investigated behaviors that can be compared directly across assays. Importantly, several parameters we typically use (e.g., Linneweber et al., 2020) cannot be applied under certain conditions, such as darkness or the absence of visual cues. Furthermore, these five parameters encompass three critical aspects of navigation across standard visual behavioral arenas: (1) The “exploration” category is characterized by parameters describing the fly’s activity. (2) Parameters related to “attention” reflect heightened responses to visual cues, but unlike commonly used metrics such as angle or stripe deviations (e.g., Coulomb, 2012; Linneweber et al., 2020), they can also be measured in absence of visual cues and are therefore suitable for cross-assay comparisons. (3) The parameter “centrophobicity,” used as a potential indicator of anxiety, is conceptually linked to the open-field test in mice, where the ratio of wall-to-open-field activity is frequently calculated as a measurement of anxiety (see for example Carter, Sheh, 2015, chapter 2. https://www.sciencedirect.com/book/9780128005118/guide-to-researchtechniques-in-neuroscience). Admittedly, this view is frequently challenged in mice, but it has a long history which is why we use it.

      Do the correlation trends hold true across all parameters? For assays in which only a subset of parameters can be directly compared, were all of these included in the analysis, or only a subset? 

      As noted above, we only included a subset of parameters in our final analysis, as many were unsuitable for comparison across assays while still providing valuable assayspecific information which are important to relate these results to previous publications.

      The correlation analysis is used to establish stability between assays. For temporal retesting, "stability" is certainly the appropriate word, but between contexts, it implies that there could be 'instability'. Rather, instead of the 'instability' of a single brain process, a different behavior in a different context could arise from engaging largely (or entirely?) distinct context-dependent internal processes, and have nothing to do with process stability per se. For inter-context similarities, perhaps a better word would be "consistency". 

      Thank you for this suggestion. During the preparation of the manuscript, we indeed frequently alternated between the terms “stability” and “consistency.” And decided to go with “stability” as the only descriptor, to keep it simple. We now fully agree with the reviewer’s argument and have replaced “stability” by “consistency” throughout the current version of the manuscript in order to increase clarity and coherence.

      The parameters are considered one by one, not in aggregate. This focuses on the stability/consistency of the variability of a single parameter at a time, rather than holistic individuality. It would appear that an appropriate measure of individuality stability (or individuality consistency) that accounts for the high-dimensional nature of individuality would somehow summarize correlations across all parameters. Why was a multivariate approach (e.g. multiple regression/correlation) not used? Treating the data with a multivariate or averaged approach would allow the authors to directly address 'individuality stability' and analyses of single-parameter variability stability.

      We agree with the reviewer that a multivariate analysis adds clear advantages in terms of statistical power, in addition to our chosen approach. On one hand, we believe that the simplicity of our initial analysis, both for correlational and mean data, makes easy for readers to understand and reproduce our data. While preparing the previous version of the manuscript we were skeptical since more complex analyses often involve numerous choices, which can complicate reproducibility. For instance, a recent study in personality psychology (Paul et al., 2024) highlighted the risks of “forking paths” in statistical analysis, showing that certain choices of statistical methods could even reverse findings—a concern mitigated by our simplistic straightforward approach. Still, in preparation of this revised version of the manuscript, we accepted the reviewer’s advice and reanalyzed the data using a generalized linear model. This analysis nicely recapitulates our initial findings and is now summarized in a single figure (Fig. 9).

      The correlation coefficients are sometimes quite low, though highly significant, and are deemed to indicate stability. For example, in Figure 4C top left, the % of time walked at 23{degree sign}C and 32{degree sign}C are correlated by 0.263, which corresponds to an R2 of 0.069 i.e. just 7% of the 32{degree sign}C variance is predictable by the 23{degree sign}C variance. Is it fair to say that a 7% determination indicates parameter stability? Another example: "Vector strength was the most correlated attention parameter... correlations ranged... to -0.197," which implies that 96% (1 - R2) of Y-maze variance is not predicted by Buridan variance. At what level does an r value not represent stability?

      We agree that this is an important question. Our paper clearly demonstrates that individuality always plays a role in decision-making (and, in this context, any behavioral output can be considered a decision). However, the non-linear relationship between certain situations and the individual’s behavior often reduces the predictive value (or correlation) across contexts, sometimes quite drastically.

      For instance, temperature has a relatively linear effect on certain behavioral parameters, leading to predictable changes across individuals. As a result, correlations across temperature conditions are often similar to those observed across time within the same situation. In contrast, this predictability diminishes when comparing conditions like the presence or absence of visual stimuli, the use of different arenas, or different modalities.

      For this reason, we believe that significance remains the best indicator for describing how measurable individuality persists, even across vastly different situations.

      The authors describe a dissociation between inter-group differences and interindividual variation stability, i.e. sometimes large mean differences between contexts, but significant correlation between individual test and retest data. Given that correlation is sensitive to slope, this might be expected to underestimate the variability stability (or consistency). Is there a way to adjust for the group differences before examining the correlation? For example, would it be possible to transform the values to in-group ranks prior to correlation analysis?  

      We thank the reviewer for this suggestion, and we have now addressed this point. To account for slope effects, we have now introduced in-group ranks for our linear model computation (see Fig. 9). 

      What is gained by classifying the five parameters into exploration, attention, and anxiety? To what extent have these classifications been validated, both in general and with regard to these specific parameters? Is the increased walking speed at higher temperatures necessarily due to an increased 'explorative' nature, or could it be attributed to increased metabolism, dehydration stress, or a heat-pain response? To what extent are these categories subjective?

      We agree that grouping our parameters into traits like exploration, attention, and anxiety always includes subjective decisions. The classification into these three categories is even considered partially controversial in the mouse specific literature, which uses the term “anxiety” in similar experiments (see for exampler Carter, Sheh, 2015, chapter 2 . https://www.sciencedirect.com/book/9780128005118/guide-to-research-techniquesin-neuroscience). Nevertheless, we believe that readers greatly benefit from these categories, since they make it easier to understand (beyond mathematical correlations) which aspects of the flies’ individuality can be considered consistent across situations. Furthermore, these categories serve as a bridge to compare insight from very distinct models.

      The legends are quite brief and do not link to descriptions of specific experiments. For example, Figure 4a depicts a graphical overview of the procedure, but I could not find a detailed description of this experiment's protocol.

      We assume the reviewer is referring to Figure 3a. The detailed experimental protocol can be found in the Materials and Methods section under Setup 2: IndyTrax Multi-Arena Platform. We have now clarified this in the mentioned figure legend.

      Using the current single-correlation analysis approach, the aims would benefit from rewording to appropriately address single-parameter variability stability/consistency (as distinct from holistic individuality). Alternatively, the analysis could be adjusted to address the multivariate nature of individuality, so that the claims and the analysis are in concordance with each other.

      The reviewer raises an important point about hierarchies within the concept of animal individuality or personality. We agree that this is best addressed by first focusing on single behavioral traits/parameters and then integrating several trait properties into a cohesive concept of animal personality (holistic individuality). To ensure consistency throughout the text, we have now thoroughly reviewed the entire manuscript clearly distinguish between single-parameter variability stability/consistency and holistic individuality/personality.

      The study presents a bounty of new technology to study visually guided behaviors. The GitHub link to the software was not available. To verify the successful transfer of open hardware and open-software, a report would demonstrate transfer by collaboration with one or more other laboratories, which the present manuscript does not appear to do. Nevertheless, making the technology available to readers is commendable.

      We have now uploaded all codes and materials to GitHub and made them available as soon as we received the reviewers’ comments. All files and materials can be accessed at https://github.com/LinneweberLab/Mathejczyk_2024_eLife_Individuality, which is now frequently mentioned throughout the revised manuscript.

      The study discusses a number of interesting, stimulating ideas about inter-individual variability, and presents intriguing data that speaks to those ideas, albeit with the issues outlined above.

      While the current work does not present any mechanistic analysis of inter-individual variability, the implementation of high-throughput assays sets up the field to more systematically investigate fly visual behaviors, their variability, and their underlying mechanisms. 

      We thank the reviewer again for the extensive and constructive feedback.

      Reviewer #2 (Public Review): 

      Summary: 

      The authors repeatedly measured the behavior of individual flies across several environmental situations in custom-made behavioral phenotyping rigs.

      Strengths: 

      The study uses several different behavioral phenotyping devices to quantify individual behavior in a number of different situations and over time. It seems to be a very impressive amount of data. The authors also make all their behavioral phenotyping rig design and tracking software available, which I think is great and I'm sure other folks will be interested in using and adapting it to their own needs.

      We thank the reviewer for highlighting the strengths of our study.

      Weaknesses/Limitations: 

      I think an important limitation is that while the authors measured the flies under different environmental scenarios (i.e. with different lighting and temperature) they didn't really alter the "context" of the environment. At least within behavioral ecology, context would refer to the potential functionality of the expressed behaviors so for example, an anti-predator context, a mating context, or foraging. Here, the authors seem to really just be measuring aspects of locomotion under benign (relatively low-risk perception) contexts. This is not a flaw of the study, but rather a limitation to how strongly the authors can really say that this demonstrates that individuality is generalized across many different contexts. It's quite possible that rank order of locomotor (or other) behaviors may shift when the flies are in a mating or risky context. 

      We agree with the reviewer that the definition of environmental context can differ between fields and that behavioral context is differently defined, particularly in ecology. Nevertheless, we highlight that our alternations of environmental context are highly stereotypic, well-defined, and unbiased from any interpretation (we only modified what we stated in the experimental description without designing a specific situation that might be again perceived individually differently. E.g., comparing a context with a predator and one without might result in a binary response because one fraction of the tested individuals might perceive the predator in the predator situation, and the other half does not. 

      The analytical framework in terms of statistical methods is lacking. It appears as though the authors used correlations across time/situations to estimate individual variation; however, far more sophisticated and elegant methods exist. The paper would be a lot stronger, and my guess is, much more streamlined if the authors employ hierarchical mixed models to analyse these data these models could capture and estimate differences in individual behavior across time and situations simultaneously. Along with this, it's currently unclear whether and how any statistical inference was performed. Right now, it appears as though any results describing how individuality changes across situations are largely descriptive (i.e. a visual comparison of the strengths of the correlation coefficients?). 

      The reviewer raises an important point, also raised by reviewer #1. On one hand, we agree with both reviewers that a more aggregated analysis has clear advantages like more statistical power and has the potential to streamline our manuscript, which is why we added such an analysis (see below). On the other hand, we would also like to defend the initial approach we took, since we think that the simplicity of the analysis for both correlational and mean data is easy to understand and reproduce. More complex analyses necessarily include the selection of a specific statistical toolbox by the experimenters and based on these decisions, different analyses become less comparable and more and more complicated to reproduce, unless the entire decision tree is flawlessly documented. For instance, a recent personality psychology paper investigated the relationship between statistical paths within the decision tree (forking analysis) and their results, leading to very surprising results (Paul et al., 2024), since some paths even reversed their findings. Such a variance in conclusions is hardly possible with the rather simplistic and easily reproducible analysis we performed. One of the major strengths of our study is the simple experimental design, allowing for rather simple and easy to understand analyses.

      We nevertheless took the reviewer’s advice very seriously and reanalyzed the data using a generalized linear model, which largely recapitulated the findings of our previously performed “low-tech” analysis in a single figure (Fig. 9).

      Another pretty major weakness is that right now, I can't find any explicit mention of how many flies were used and whether they were re-used across situations. Some sort of overall schematic showing exactly how many measurements were made in which rigs and with which flies would be very beneficial. 

      We apologize for this inconvenience. A detailed overview of male and female sample sizes has been listed in the supplemental boxplots next to the plots (e.g, Fig S6). Apparently, this was not visible enough. Therefore, we have now also uniformly added the sample sizes to the main figure legends.

      I don't necessarily doubt the robustness of the results and my guess is that the author's interpretations would remain the same, but a more appropriate modeling framework could certainly improve their statistical inference and likely highlight some other cool patterns as these methods could better estimate stability and covariance in individual intercepts (and potentially slopes) across time and situation.

      As described above, we have now added the suggested analyses. We hope that the reviewer will appreciate the new Fig. 9, which, in our opinion, largely confirms our previous findings using a more appropriate generalized linear modelling framework.

      Reviewer #3 (Public Review): 

      This manuscript is a continuation of past work by the last author where they looked at stochasticity in developmental processes leading to inter-individual behavioural differences. In that work, the focus was on a specific behaviour under specific conditions while probing the neural basis of the variability. In this work, the authors set out to describe in detail how stable the individuality of animal behaviours is in the context of various external and internal influences. They identify a few behaviours to monitor (read outs of attention, exploration, and 'anxiety'); some external stimuli (temperature, contrast, nature of visual cues, and spatial environment); and two internal states (walking and flying).

      They then use high-throughput behavioural arenas - most of which they have built and made plans available for others to replicate - to quantify and compare combinations of these behaviours, stimuli, and internal states. This detailed analysis reveals that:

      (1) Many individualistic behaviours remain stable over the course of many days. 

      (2) That some of these (walking speed) remain stable over changing visual cues. Others (walking speed and centrophobicity) remain stable at different temperatures.

      (3) All the behaviours they tested failed to remain stable over the spatially varying environment (arena shape).

      (4) Only angular velocity (a readout of attention) remains stable across varying internal states (walking and flying).

      Thus, the authors conclude that there is a hierarchy in the influence of external stimuli and internal states on the stability of individual behaviours.

      The manuscript is a technical feat with the authors having built many new highthroughput assays. The number of animals is large and many variables have been tested - different types of behavioural paradigms, flying vs walking, varying visual stimuli, and different temperatures among others. 

      We thank the reviewer for this extraordinary kind assessment of our work!

      Recommendations for the authors:  

      Reviewing Editor (Recommendations For The Authors): 

      While appreciating the effort and quality of the work that went into this manuscript, the reviewers identified a few key points that would greatly benefit this work.

      (1) Statistical methods adopted. The dataset produced through this work is large, with multiple conditions and comparisons that can be made to infer parameters that both define and affect the individualistic behaviour of an animal. Hierarchical mixed models would be a more appropriate approach to handle such datasets and infer statistically the influence of different parameters on behaviours. We recommend that the authors take this approach in the analyses of their data.

      (2) Brevity in the text. We urge the authors to take advantage of eLife's flexible template and take care to elaborate on the text in the results section, the methods adopted, the legends, and the guides to the legends embedded in the main text. The findings are likely to be of interest to a broad audience, and the writing currently targets the specialist.

      Reviewer #2 (Recommendations For The Authors): 

      I want to start by saying this seems like a really cool study! It's an impressive amount of work and addressing a pretty basic question that is interesting (at least I think so!)

      We thank the reviewer again for this assessment!

      That said, I would really strongly recommend the authors embrace using mixed/hierarchical models to analyze their data. They're producing some really impressive data and just doing Pearson correlation coefficients across time points and situations is very clunky and actually losing out on a lot of information. The most up-todate, state-of-the-art are mixed models - these models can handle very complex (or not so complex) random structures which can estimate variance and importantly, covariance, in individual intercepts both over time and across situations. I actually think this could add some really cool insights into the data and allow you to characterize the patterns you're seeing in far more detail. It's datasets exactly like this that are tailormade for these complex variance partitioning models! 

      As mentioned before, we have now adopted a more appropriate GLM-based data analysis (see above).

      Regardless of which statistical methods you decide to use, please explicitly state in your methods exactly what analyses you did. That is completely lacking now and was a bit frustrating. As such, it's completely unclear whether or how statistical inference was performed. How did you do the behavioral clustering? 

      We apologize that these points were not clearly represented in the previous version of the manuscript. We have now significantly extended the methods section to include a separate paragraph on the statistical methods used, in order to address this critique and hope that the revised version is clear now.

      Also, I could not for the life of me figure out how many flies had been measured. Were they reused across the situation? Or not?

      We reused the same flies across situations whenever possible. However, having one fly experience all assays consecutively was not feasible due to their fragility. Instead, individual flies were exposed to at least 2 of the 3 groups of assays used here: in the Indytrax setup ,  the Buridan arenas and variants thereof, and the virtual arenas Hence, we have compared flies across entirely different setups, but the number of times flies can be retested is limited (as otherwise, sample sizes will drop over time, and the flies will have gone through too many experimental alternations). To make this more clear, we have elaborated on this point in the main text, and we added group sample sizes to figure legends r.

      What are these "groups" and "populations" that are referred to in the results (e.g. lines 384, 391, 409)?

      We apologize for using these two terms somewhat interchangeably without proper introduction/distinction. We have now made this more clear in at the beginning of the results in the main text, by focusing on the term ‘group’. By ‘group’ we refer to the average of all individuals tested in the same situation. Sample sizes in the figure legends now indicate group/population sizes to make this clearer.

      Some of the rationale for the development of the behavioral rigs would have actually been nice to include in the intro, rather than in the results.

      This rationale is introduced at the beginning of the last paragraph of the introduction. We hope that this now becomes clear in the revised version of the manuscript.

      Reviewer #3 (Recommendations For The Authors): 

      This manuscript would do well to take advantage of eLife's flexible word limit. I sense that it has been written in brevity for a different journal but I would urge the authors to revisit this and unpack the language here - in the text, in the figure legends, in references to the figures within the text. The way it's currently written, though not misleading, will only speak to the super-specialist or the super-invested :). But the findings are nice, and it would be nice to tailor it to a broader audience.

      We appreciate this suggestion. Initially, we were hoping that we had described our results as clearly and brief as possible. We apologize if that was not always the case. The comments and requests of all three reviewers now led to a series of additions to both main text and methods, leading to a significantly expanded manuscript. We hope that these additons are helpful for the general, non-expert audience.

    1. frequency (MAF) > 5%). The data was split by chromosome (Chr 1–7, 9–22, X, Y for training;Chr 8 for testing)

      Nice, clever guard against leakage.

    1. Author response:

      The following is the authors’ response to the original reviews

      Overview of changes in the revision

      We thank the reviewers for the very helpful comments and have extensively revised the paper. We provide point-by-point responses below and here briefly highlight the major changes:

      (1) We expanded the discussion of the relevant literature in children and adults.

      (2) We improved the contextualization of our experimental design within previous reinforcement studies in both cognitive and motor domains highlighting the interplay between the two.

      (3) We reorganized the primary and supplementary results to better communicate the findings of the studies.

      (4) The modeling has been significantly revised and extended. We now formally compare 31 noise-based models and one value-based model and this led to a different model from the original being the preferred model. This has to a large extent cleaned up the modeling results. The preferred model is a special case (with no exploration after success) of the model proposed in Therrien et al. (2018). We also provide examples of individual fits of the model, fit all four tasks and show group fits for all, examine fits vs. data for the clamp phases by age, provide measures of relative and absolute goodness of fit, and examine how the optimal level of exploration varies with motor noise.

      Reviewer #1 (Public review):

      Summary:

      Here the authors address how reinforcement-based sensorimotor adaptation changes throughout development. To address this question, they collected many participants in ages that ranged from small children (3 years old) to adulthood (1 8+ years old). The authors used four experiments to manipulate whether binary and positive reinforcement was provided probabilistically (e.g., 30 or 50%) versus deterministically (e.g., 100%), and continuous (infinite possible locations) versus discrete (binned possible locations) when the probability of reinforcement varied along the span of a large redundant target. The authors found that both movement variability and the extent of adaptation changed with age.

      Thank you for reviewing our work. One note of clarification. This work focuses on reinforcementbased learning throughout development but does not evaluate sensorimotor adaptation. The four tasks presented in this work are completed with veridical trajectory feedback (no perturbation).

      The goal is to understand how children at different ages adjust their movements in response to reward feedback but does not evaluate sensorimotor adaptation. We now explain this distinction on line 35.

      Strengths:

      The major strength of the paper is the number of participants collected (n = 385). The authors also answer their primary question, that reinforcement-based sensorimotor adaptation changes throughout development, which was shown by utilizing established experimental designs and computational modelling.

      Thank you.

      Weaknesses:

      Potential concerns involve inconsistent findings with secondary analyses, current assumptions that impact both interpr tation and computational modelling, and a lack of clearly stated hypotheses.

      (1) Multiple regression and Mediation Analyses.

      The challenge with these secondary analyses is that:

      (a) The results are inconsistent between Experiments 1 and 2, and the analysis was not performed for Experiments 3 and 4,

      (b) The authors used a two-stage procedure of using multiple regression to determine what variables to use for the mediation analysis, and

      (c)The authors already have a trial-by-trial model that is arguably more insightful.

      Given this, some suggested changes are to:

      (a) Perform the mediation analysis with all the possible variables (i.e., not informed by multiple regression) to see if the results are consistent.

      (b) Move the regression/mediation analysis to Supplementary, since it is slightly distracting given current inconsistencies and that the trial-by-trial model is arguably more insightful.

      Based on these comments, we have chosen to remove the multiple regression and mediation analyses. We agree that they were distracting and that the trial-by-trial model allows for differentiation of motor noise from exploration variability in the learning block.

      (2) Variability for different phases and model assumptions:

      A nice feature of the experimental design is the use of success and failure clamps. These clamped phases, along with baseline, are useful because they can provide insights into the partitioning of motor and exploratory noise. Based on the assumptions of the model, the success clamp would only reflect variability due to motor noise (excludes variability due to exploratory noise and any variability due to updates in reach aim). Thus, it is reasonable to expect that the success clamps would have lower variability than the failure clamps (which it obviously does in Figure 6), and presumably baseline (which provides success and failure feedback, thus would contain motor noise and likely some exploratory noise).

      However, in Figure 6, one visually observes greater variability during the success clamp (where it is assumed variability only comes from motor noise) compared to baseline (where variability would come from: (a) Motor noise.

      (b) Likely some exploratory noise since there were some failures.

      (c) Updates in reach aim.

      Thanks for this comment. It made us realize that some of our terminology was unintentionally misleading. Reaching to discrete targets in the Baseline block was done to a) determine if participants could move successfully to targets that are the same width as the 100% reward zone in the continuous targets and b) determine if there are age dependent changes in movement precision. We now realize that the term Baseline Variability was misleading and should really be called Baseline Precision.

      This is an important distinction that bears on this reviewer's comment. In clamp trials, participants move to continuous targets. In baseline, participants move to discrete targets presented at different locations. Clamp Variability cannot be directly compared to Baseline Precision because they are qualitatively different. Since the target changes on each baseline trial, we would not expect updating of desired reach (the target is the desired reach) and there is therefore no updating of reach based on success or failure. The SD we calculate over baseline trials is the endpoint variability of the reach locations relative to the target centers. In success clamp, there are no targets so the task is qualitatively different.

      We have updated the text to clarify terminology, expand upon our operational definitions, and motivate the distinct role of the baseline block in our task paradigm (line 674).

      Given the comment above, can the authors please:

      (a) Statistically compare movement variability between the baseline, success clamp, and failure clamp phases.

      Given our explanation in the previous point we don't think that comparing baseline to the clamp makes sense as the trials are qualitatively different.

      (b) The authors have examined how their model predicts variability during success clamps and failure clamps, but can they also please show predictions for baseline (similar to that of Cashaback et al., 2019; Supplementary B, which alternatively used a no feedback baseline)?

      Again, we do not think it makes sense to predict the baseline which as we mention above has discrete targets compared to the continuous targets in the learning phase.

      (c) Can the authors show whether participants updated their aim towards their last successful reach during the success clamp? This would be a particularly insightful analysis of model assumptions.

      We have now compared 31 models (see full details in next response) which include the 7 models in Roth et al. (2023). Several of these model variants have updating even after success with so called planning noise). We also now fit the model to the data that includes the clamp phases (we can't easily fit to success clamp alone as there are only 10 trials). We find that the preferred model is one that does not include updating after success.

      (d) Different sources of movement variability have been proposed in the literature, as have different related models. One possibility is that the nervous system has knowledge of 'planned (noise)' movement variability that is always present, irrespective of success (van Beers, R.J. (2009). Motor learning is optimally tuned to the properties of motor noise. Neuron, 63(3), 406-417). The authors have used slightly different variations of their model in the past. Roth et al (2023) directly Rill compared several different plausible models with various combinations of motor, planned, and exploratory noise (Roth A, 2023, "Reinforcement-based processes actively regulate motor exploration along redundant solution manifolds." Proceedings of the Royal Society B 290: 20231475: see Supplemental). Their best-fit model seems similar to the one the authors propose here, but the current paper has the added benefit of the success and failure clamps to tease the different potential models apart. In light of the results of a), b), and c), the authors are encouraged to provide a paragraph on how their model relates to the various sources of movement variability and ther models proposed in the literature.

      Thank you for this. We realized that the models presented in Roth et al. (2023) as well as in other papers, are all special cases of a more general model. Moreover, in total there are 30 possible variants of the full model so we have now fit all 31 models to our larger datasets and performed model selection (Results and Methods). All the models can be efficiently fit by Kalman smoother to the actual data (rather than to summary statistics which has sometimes been done). For model selection, we fit only the 100 learning trials and chose the preferred model based on BIC on the children's data (Figure 5—figure Supplement 1). After selecting the preferred model we then refit this model to all trials including the clamps so as to obtain the best parameter estimates.

      The preferred model was the same whether we combined the continuous and discrete probabilistic data or just examin d each task separately either for only the children or for the children and adults combined. The preferred model is a pecial case (no exploration after success) of the one proposed in Therrien et al. (2018) and has exploration variability (after failure) and motor noise with full updating with exploration variability (if any) after success. This model differs from the model in the original submission which included a partial update of the desired reach after exploration this was considered the learning rate. The current model suggests a unity learning rate.

      In addition, as suggested by another reviewer, we also fit a value-based model which we adapted from the model described in Giron et al. (2023). This model was not preferred.

      We have added a paragraph to the Discussion highlighting different sources of variability and links to our model comparison.

      (e) line 155. Why would the success clamp be composed of both motor and exploratory noise? Please clarify in the text

      This sentence was written to refer to clamps in general and not just success clamps. However, in the revision this sentence seemed unnecessary so we have removed it.

      (3) Hypotheses:

      The introduction did not have any hypotheses of development and reinforcement, despite the discussion above setting up potential hypotheses. Did the authors have any hypotheses related to why they might expect age to change motor noise, exploratory noise, and learning rates? If so, what would the experimental behaviour look like to confirm these hypotheses? Currently, the manuscript reads more as an exploratory study, which is certainly fine if true, it should just be explicitly stated in the introduction. Note: on line 144, this is a prediction, not a hypothesis. Line 225: this idea could be sharpened. I believe the authors are speaking to the idea of having more explicit knowledge of action-target pairings changing behaviour.

      We have included our hypotheses and predictions at two points in the paper In the introduction we modified the text to:

      "We hypothesized that children's reinforcement learning abilities would improve with age, and depend on the developmental trajectory of exploration variability, learning rate (how much people adjust their reach after success), and motor noise (here defined as all sources of noise associated with movement, including sensory noise, memory noise, and motor noise). We think that these factors depend on the developmental progression of neural circuits that contribute to reinforcement learning abilities (Raznahan et al., 2014; Nelson et al., 2000; Schultz, 1998)."

      In results we modified the sentence to:

      "We predicted that discrete targets could increase exploration by encouraging children to move to a different target after failure.”

      Reviewer #2 (Public review):

      Summary:

      In this study, Hill and colleagues use a novel reinforcement-based motor learning task ("RML"), asking how aspects of RML change over the course of development from toddler years through adolescence. Multiple versions of the RML task were used in different samples, which varied on two dimensions: whether the reward probability of a given hand movement direction was deterministic or probabilistic, and whether the solution space had continuous reach targets or discrete reach targets. Using analyses of both raw behavioral data and model fits, the authors report four main results: First, developmental improvements reflected 3 clear changes, including increases in exploration, an increase in the RL learning rate, and a reduction of intrinsic motor noise. Second, changes to the task that made it discrete and/or deterministic both rescued performance in the youngest age groups, suggesting that observed deficits could be linked to continuous/probabilistic learning settings. Overall, the results shed light on how RML changes throughout human development, and the modeling characterizes the specific learning deficits seen in the youngest ages.

      Strengths:

      (1) This impressive work addresses an understudied subfield of motor control/psychology - the developmental trajectory of motor learning. It is thus timely and will interest many researchers.

      (2) The task, analysis, and modeling methods are very strong. The empirical findings are rather clear and compelling, and the analysis approaches are convincing. Thus, at the empirical level, this study has very few weaknesses.

      (3) The large sample sizes and in-lab replications further reflect the laudable rigor of the study.

      (4) The main and supplemental figures are clear and concise.

      Thank you.

      Weaknesses:

      (1) Framing.

      One weakness of the current paper is the framing, namely w/r/t what can be considered "cognitive" versus "non-cognitive" ("procedural?") here. In the Intro, for example, it is stated that there are specific features of RML tasks that deviate from cognitive tasks. This is of course true in terms of having a continuous choice space and motor noise, but spatially correlated reward functions are not a unique feature of motor learning (see e.g. Giron et al., 2023, NHB). Given the result here that simplifying the spatial memory demands of the task greatly improved learning for the youngest cohort, it is hard to say whether the task is truly getting at a motor learning process or more generic cognitive capacities for spatial learning, working memory, and hypothesis testing. This is not a logical problem with the design, as spatial reasoning and working memory are intrinsically tied to motor learning. However, I think the framing of the study could be revised to focus in on what the authors truly think is motor about the task versus more general psychological mechanisms. Indeed, it may be the case that deficits in motor learning in young children are mostly about cognitive factors, which is still an interesting result!

      Thank you for these comments on the framing of our study. We now clearly acknowledge that all motor tasks have cognitive components (new paragraph at line 65). We also explain why we think our tasks has features not present in typical cognitive tasks.

      (2) Links to other scholarship.

      If I'm not mistaken a common observation in tudies of the development of reinforcement learning is a decrease in exploration over-development (e.g., Nussenbaum and Hartley, 2019; Giron et al., 2023; Schulz et al., 2019); this contrasts with the current results which instead show an increase. It would be nice to see a more direct discussion of previous findings showing decreases in exploration over development, and why the current study deviates from that. It could also be useful for the authors to bring in concepts of different types of exploration (e.g. "directed" vs "random"), in their interpretations and potentially in their modeling.

      We recognize that our results differ from prior work. The optimal exploration pattern differs from task to task. We now discuss that exploration is not one size fits all, it's benefits vary depending upon the task. We have added the following paragraphs to the Discussion section:

      "One major finding from this study is that exploration variability increases with age. Some other studies of development have shown that exploration can decrease with age indicating that adults explore less compared to children (Schulz et al., 2019; Meder et al., 2021; Giron et al., 2023). We believe the divergence between our work and these previous findings is largely due to the experimental design of our study and the role of motor noise. In the paradigm used initially by Schulz et al. (2019) and replicated in different age groups by Meder et al. (2021) and Giron et al. (2023), participants push buttons on a two-dimensional grid to reveal continuous-valued rewards that are spatially correlated. Participants are unaware that there is a maximum reward available and therefore children may continue to explore to reduce uncertainty if they have difficulty evaluating whether they have reached a maxima. In our task by contrast, participants are given binary reward and told that there is a region in which reaches will always be rewarded. Motor noise is an additional factor which plays a key role in our reaching task but minimal if any role in the discretized grid task. As we show in simulations of our task, as motor noise goes down (as it is known to do through development) the optimal amount of exploration goes up (see Figure 7—figure Supplement 2 and Appendix 1). Therefore, the behavior of our participants is rational in terms of R230 increasing exploration as motor noise decreases.

      A key result in our study is that exploration in our task reflects sensitivity to failure. Older children make larger adjustments after failure compared to younger children to find the highly rewarded zone more quickly. Dhawale et al. (2017) discuss the different contexts in which a participant may explore versus exploit (i.e., stick at the same position). Exploration is beneficial when reward is low as this indicates that the current solution is no longer ideal, and the participant should search for a better solution. Konrad et al. (2025) have recently shown this behavior in a real-world throwing task where 6 to 12 year old children increased throwing variability after missed trials and minimized variability after successful trials. This has also been shown in a postural motor control task where participants were more variable after non-rewarded trials compared to rewarded trials (Van Mastrigt et al., 2020). In general, these studies suggest that the optimal amount of exploration is dependent on the specifics of the task."

      (3) Modeling.

      First, I may have missed something, but it is unclear to me if the model is actually accounting for the gradient of rewards (e.g., if I get a probabilistic reward moving at 45°, but then don't get one at 40°, I should be more likely to try 50° next then 35°). I couldn't tell from the current equations if this was the case, or if exploration was essentially "unsigned," nor if the multiple-trials-back regression analysis would truly capture signed behavior. If the model is sensitive to the gradient, it would be nice if this was more clear in the Methods. If not, it would be interesting to have a model that does "function approximation" of the task space, and see if that improves the fit or explains developmental changes.

      The model we use (similar to Roth et al. (2023) and Therrien et al. (2016, 2018)) does not model the gradient. Exploration is always zero-mean Gaussian. As suggested by the reviewer, we now also fit a value-based model (described starting at line 810) which we adapted from the model presented in Giron et al. (2023). We show that the exploration and noise-based model is preferred over the value-based model.

      The multiple-trials-back regression was unsigned as the intent was to look at the magnitude and not the direction of the change in movement. We have decided to remove this analysis from the manuscript as it was a source of confusion and secondary analysis that did not add substantially to the findings of these studies.

      Second, I am curious if the current modeling approach could incorporate a kind of "action hysteresis" (aka perseveration), such that regardless of previous outcomes, the same action is biased to be repeated (or, based on parameter settings, avoided).

      In some sense, the learning rate in the model in the original submission is highly related to perseveration. For example if the learning rate is 0, then there is complete perseveration as you simply repeat the same desired movement. If the rate is 1, there is no perseveration and values in between reflect different amounts of perseveration. Therefore, it is not easy to separate learning rate from perseveration. Adding perseveration as another parameter would likely make it and the learning unidentifiable. However, we now compare 31 models and those that have a non-unity learning rate are not preferred suggesting there is little perseveration.

      (4) Psychological mechanisms. There is a line of work that shows that when children and adults perform RL tasks they use a combination of working memory and trial-by-trial incremental learning processes (e.g., Master et al., 2020; Collins and Frank 2012). Thus, the observed increase in the learning rate over development could in theory reflect improvements in instrumental learning, working memory, or both. Could it be that older participants are better at remembering their recent movements in short-term memory (Hadjiosif et al., 2023; Hillman et al., 2024)?

      We agree that cognitive processes, such as working memory or visuospatial processing, play a role in our task and describe cognitive elements of our task in the introduction (new paragraph at line 65). However, the sensorimotor model we fit to the data does a good job of explaining the variation across age, which suggests that that age-dependent cognitive processes probably play a smaller role.

      Reviewer #3 (Public review):

      Summary:

      The study investigates reinforcement learning across the lifespan with a large sample of participants recruited for an online game. It finds that children gradually develop their abilities to learn reward probability, possibly hindered by their immature spatial processing and probabilistic reasoning abilities. Motor noise, reinforcement learning rate, and exploration after a failure all contribute to children's subpar performance.

      Strengths:

      (1) The paradigm is novel because it requires continuous movement to indicate people's choices, as opposed to discrete actions in previous studies.

      (2) A large sample of participants were recruited.

      (3) The model-based analysis provides further insights into the development of reinforcement learning ability.

      Thank you.

      Weaknesses:

      (1 ) The adequacy of model-based analysis is questionable, given the current presentation and some inconsistency in the results.

      Thank you for raising this concern. We have substantially revised the model from our first submission. We now compare 31 noise-based models and 1 value-based model and fit all of the tasks with the preferred model. We perform model selection using the two tasks with the largest datasets to identify the preferred model. From the preferred model, we found the parameter fits for each individual dataset and simulated the trial by trial behavior allowing comparison between all four tasks. We now show examples of individual fits as well as provide a measure of goodness of fit. The expansion of our modeling approach has resolved inconsistencies and sharpened the conclusions drawn from our model.

      (2) The task should not be labeled as reinforcement motor learning, as it is not about learning a motor skill or adapting to sensorimotor perturbations. It is a classical reinforcement learning paradigm.

      We now make it clear that our reinforcement learning task has both motor and cognitive demands, but does not fall entirely within one of these domains. We use the term motor learning because it captures the fact that participants maximize reward by making different movements, corrupted by motor noise, to unmarked locations on a continuous target zone. When we look at previous ublications, it is clear that our task is similar to those that also refer to this as reinforcement motor learning Cashaback et al. (2019) (reaching task using a robotic arm in adults), Van Mastrigt et al. (2020) (weight shifting task in adults), and Konrad et al. (2025) (real-world throwing task in children). All of these tasks involve trial-by-trial learning through reinforcement to make the movement that is most effective for a given situation. We feel it is important to link our work to these previous studies and prefer to preserve the terminology of reinforcement motor learning.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Thank you for this summary. Rather than repeat the extended text from the responses to the reviewers here, we point the Editor to the appropriate reviewer responses for each issue raised.

      The reviewers and editors have rated the significance of the findings in your manuscript as "Valuable" and the strength of evidence as "Solid" (see eLife evalutation). A consultancy discussion session to integrate the public reviews and recommendations per reviewer (listed below), has resulted in key recommendations for increasing the significance and strength of evidence:

      To increase the Significance of the findings, please consider the following:

      (1) Address and reframe the paper around whether the task is truly getting at a motor learning process or more generic cognitive decision-making capacities such as spatial memory, reward processing, and hypothesis testing.

      We have revised the paper to address the comments on the framing of our work. Please see responses to the public review comments of Reviewers #2 and #3.

      (2) It would be beneficial to specify the differences between traditional reinforcement algorithms (i.e., using softmax functions to explore, and build representations of state-action-reward) and the reinforcement learning models used here (i.e., explore with movement variability, update reach aim towards the last successful action), and compare present findings to previous cognitive reinforcement learning studies in children.

      Please see response to the public review comments of Reviewer #1 in which we explain the expansion of our modeling approach to fit a value-based model as well as 31 other noise-based models. In our response to the public review comments of Reviewer #2, we comment on our expanded discussion of how our findings compare with previous cognitive reinforcement learning studies.

      To move the "Strength of Evidence" to "Convincing", please consider doing the following:

      (1 ) Address some apparently inconsistent and unrealistic values of motor noise, exploration noise, and learning rate shown for individual participants (e.g., Figure 5b; see comments reviewers 1 and take the following additional steps: plotting r squares for individual participants, discussing whether individual values of the fitted parameters are plausible and whether model parameters in each age group can extrapolate to the two clamp conditions and baselines.

      We have substantially updated our modeling approach. Now that we compare 31 noise-based models, the preferred model does not show any inconsistent or unrealistic values (see response to Reviewer #3). Additionally, we now show example individual fits and provide both relative and absolute goodness of fit (see response to Reviewer #3).

      (2) Relatedly, to further justify if model assumptions are met, it would be valuable to show that the current learning model fits the data better than alternative models presented in the literature by the authors themselves and by others (reviewer 1). This could include alternative development models that formalise the proposed explanations for age-related change: poor spatial memory, reward/outcome processing, and exploration strategies (reviewer 2).

      Please see response to public review comments of Reviewer #1 in which we explain that we have now fit a value-based model as well as 31 other noise-based models providing a comparison of previous models as well as novel models. This led to a slightly different model being preferred over the model in the original submission (updated model has a learning rate of unity). These models span many of the processes previously proposed for such tasks. We feel that 32 models span a reasonable amount of space and do not believe we have the power to include memory issues or heuristic exploration strategies in the model.

      (3) Perform the mediation analysis with all the possible variables (i.e., not informed by multiple regression) to see if the results are more consistent across studies and with the current approach (see comments reviewer 1).

      Please see response to public review comments of Reviewer #1. We chose to focus only on the model based analysis because it allowed us to distinguish between exploration variability and motor noise.

      Please see below for further specific recommendations from each reviewer.

      Reviewer #1 (Recommendations for the author):

      (1) In general, there should be more discussion and contextualization of other binary reinforcement tasks used in the motor literature. For example, work from Jeroen Smeets, Katinka van der Kooij, and Joseph Galea.

      Thank you for this comment. We have edited the Introduction to better contextualize our work within the reinforcement motor learning literature (see line 67 and line 83).

      (2) Line 32. Very minor. This sentence is fine, but perhaps could be slightly improved. “select a location along a continuous and infinite set of possible options (anywhere along the span of the bridge)"

      Thank you for this comment. We have edited the sentence to reflect this suggestion.

      (3) Line 57. To avoid some confusion in successive sentences: Perhaps, "Both children over 12 and adolescents...".

      Thank you for this comment. We have edited the sentence to reflect this suggestion.

      (4) Line 80. This is arguably not a mechanistic model, since it is likely not capturing the reward/reinforcement machinery used by the nervous system, such as updating the expected value using reward predic tion errors/dopamine. That said, this phenomenological model, and other similar models in the field, do very well to capture behaviour with a very simple set of explore and update rules.

      We use mechanistic in the standard use in modeling, as in Levenstein et al. (2023), for example. The contrast is not with neural modeling, but with normative modeling, in which one develops a model to optimize a function (or descriptive models as to what a system is trying to achieve). In mechanistic modeling one proposes a mechanism and this can be at a state-space level (as in our case) or a neural level (as suggested my the reviewer) but both are considered mechanistic, just at different levels. Quoting Levenstein "... mechanistic models, in which complex processes are summarized in schematic or conceptual structures that represent general properties of components and their interactions, are also commonly used." We now reference the Levenstein paper to clarify what we mean by mechanistic.

      (5) Figure 1. It would be useful to state that the x-axis in Figure 1 is in normalized units, depending on the device.

      Thank you for this comment. We have added a description of the x-axis units to the Fig. 1 caption.

      (6) Were there differences in behaviour for these different devices? e.g., how different was motor noise for the mouse, trackpad, and touchscreen?

      Thank you for this question. We did not find a significant effect of device on learning or precision in the baseline block. We have added these one way ANOVA results for each task in Supplementary Table 1.

      (7) Line 98. Please state that participants received reinforcement feedback during baseline.

      Thank you for this comment. We have updated the text to specify that participants receive reward feedback during the baseline block.

      (8) Line 99. Did the distance from the last baseline trial influence whether the participant learned or did not learn? For example, would it place them too far from the peak success location such that it impacted learning?

      Thank you for this question. We looked at whether the position of movement on the last baseline block trial was correlated with the first movement position in the learning block. We did not find any correlations between these positions for any of the tasks. Interestingly, we found that the majority of participants move to the center of the workspace on the first trial of the learning block for all tasks (either in the presence of the novel continuous target scene or the presentation of 7 targets all at once). We do not think that the last movement in the baseline block "primed" the participant for the location of the success zone in the learning block. We have added the following sentence to the Results section:

      "Note that the reach location for the first learning trial was not affected by (correlated with) the target position on the last baseline trial (p > 0.3 for both children and adults, separately)."

      (9) The term learning distance could be improved. Perhaps use distance from target.

      Thank you for this comment. We appreciate that learning distance defined with 0 as the best value is counter intuitive. We have changed the language to be "distance from target" as the learning metric.

      (10) Line 188. This equation is correct, but to estimate what the standard deviation by the distribution of changes in reach position is more involved. Not sure if the authors carried out this full procedure, which is described in Cashaback et al., 2019; Supplemental 2.

      There appear to be no Supplemental 2 in the referenced paper so we assume the reviewer is referring to Supplemental B which deals with a shuffling procedure to examine lag-1 correlations.

      In our tasks, we are limited to only 9 trials to analyze in each clamp phase so do not feel a shuffling analysis is warranted. In these blocks, we are not trying to 'estimate what the standard deviation by the distribution of changes in reach position' but instead are calculating the standard deviation of the reach locations and comparing the model fit (for which the reviewer says the formula is correct) with the data. We are unclear what additional steps the reviewer is suggesting. In our updated model analysis, we fit the data including the clamp phases for better parameter estimation. We use simulations to estimate s.d. in the clamp phase (as we ensure in simulations the data does not fall outside the workspace) making the previous analytic formulas an approximation that are no longer used.

      (11) Line 197-199. Having done the demo task, it is somewhat surprising that a 3-year-old could understand these instructions (whose comprehension can be very different from even a 5-year old).

      Thank you for raising this concern. We recognize that the younger participants likely have different comprehension levels compared to older participants. However, we believe that the majority of even the youngest participants were able to sufficiently understand the goal of the task to move in a way to get the video clip to play. We intentionally designed the tasks to be simple such that the only instructions the child needed to understand were that the goal was to get the video clip to play as much as possible and the video clip played based on their movement. Though the majority of younger children struggled to learn well on the probabilistic tasks, they were able to learn well on the deterministic tasks where the task instructions were virtually identical with the exception of how many places in the workspace could gain reward. On the continuous probabilistic task, we did have a small number (n = 3) of 3 to 5 year olds who exhibited more mature learning ability which gives us confidence that the younger children were able to understand the task goal.

      (12) Line 497: Can the authors please report the F-score and p-value separately for each of these one-way ANOVA (the device is of particular interest here).

      Thank you for this request. We have added ina upplementarytable (Supplementary Table 1) with the results of these ANOVAs.

      (13) Past work has discussed how motivation influences learning, which is a function of success rate (van der Kooij, K., in 't Veld, L., & Hennink, T. (2021). Motivation as a function of success frequency. Motivation and Emotion, 45, 759-768.). Can the authors please discuss how that may change throughout development?

      Thank you for this comment. While motivation most probably plays a role in learning, in particular in a game environment, this was out of the scope of the direct focus of this work and not something that our studies were designed to test. We have added the following sentence to the discussion section to address this comment:

      "We also recognize that other processes, such as memory and motivation, could affect performance on these tasks however our study was not designed to test these processes directly and future work would benefit from exploring these other components more explicitly."

      (14) Supplement 6. This analysis is somewhat incomplete because it does not consider success.

      Pekny and collegues (2015) looked at 3 trials back but considered both success and reward. However, their analysis has issues since successive time points are not i.i.d., and spurious relationships can arise. This issue is brought up by Dwahale (Dhawale, A. K., Miyamoto, Y. R., Smith, M. A., & R475 Ölveczky, B. P. (2019). Adaptive regulation of motor variability. Current Biology, 29(21), 3551-3562.). Perhaps it is best to remove this analysis from the paper.

      Thank you for this comment. We have decided to remove this secondary analysis from the paper as it was a source of confusion and did not add to the understanding and interpretation of our behavioral results.

      Reviewer #2 (Recommendations for the author):

      (1 ) the path length ratio analyses in the supplemental are interesting but are not mentioned in the main paper. I think it would be helpful to mention these as they are somewhat dramatic effects

      Thank you for this comment. Path length ratios are defined in the Methods and results are briefly summarized in the Results section with a point to the supplementary figures. We have updated the text to more explicitly report the age related differences in path length ratios.

      (2) The second to last paragraph of the intro could use a sentence motivating the use ofthe different task features (deterministic/probabilistic and discrete/continuous).

      Thank you for this comment. We have added an additional motivating sentence to the introduction.

      Reviewer #3 (Recommendations for the author):

      The paper labeled the task as one for reinforcement motor learning, which is not quite appropriate in my opinion. Motor learning typically refers to either skill learning or motor adaptation, the former for improving speed-accuracy tradeoffs in a certain (often new) motor skill task and the latter for accommodating some sensorimotor perturbations for an existing motor skill task. The gaming task here is for neither. It is more like a

      decision-making task with a slight contribution to motor execution, i.e., motor noise. I would recommend the authors label the learning as reinforcement learning instead of reinforcement motor learning.

      Thank you for this comment. As noted in the response to the public review comments, we agree that this task has components of classical reinforcement learning (i.e. responding to a binary reward) but we specifically designed it to require the learning of a movement within a novel game environment. We have added a new paragraph to the introduction where we acknowledge the interplay between cognitive and motor mechanisms while also underscoring the features in our task that we think are not present in typical cognitive tasks.

      My major concern is whether the model adequately captures subjects' behavior and whether we can conclude with confidence from model fitting. Motor noise, exploration noise, and learning rate, which fit individual learning patterns (Figure 5b), show some quite unrealistic values. For example, some subjects have nearly zero motor noise and a 100% learning rate.

      We have now compared 31 models and the preferred model is different from the one in the first submission. The parameter fits of the new model do not saturate in any way and appear reasonable to us. The updates to the model analysis have addressed the concern of previously seen unrealistic values in the prior draft.

      Currently, the paper does not report the fitting quality for individual subjects. It is good to have an exemplary subject's fit shown, too. My guess is that the r-squared would be quite low for this type of data. Still, given that the children's data is noisier, it might be good to use the adult data to show how good the fitting can be (individual fits, r squares, whether the fitted parameters make sense, whether it can extrapolate to the two clamp phases). Indeed, the reliability of model fitting affects how we should view the age effect of these model parameters.

      We now show fits to individual subjects. But since this is a Kalman smoother it fits the data perfectly by generating its best estimate of motor noise and exploration variability on each trial to fully account for the data — so in that sense R<sup>2</sup> is always 1 so that is not helpful.

      While the BIC analysis with the other model variants provides a relative goodness of fit, it is not straightforward to provide an absolute goodness of fit such as standard R<sup>2</sup> for a feedforward simulation of the model given the parameters (rather than the output of the Kalman smoother). There are two problems. First, there is no single model output. Each time the model is simulated with the fit parameters it produces a different output (due to motor noise, exploration variability and reward stochasticity). Second, the model is not meant to reproduce the actual motor noise, exploration variability and reward stochasticity of a trial. For example, the model could fit pure Gaussian motor noise across trials (for a poor learner) by accurately fitting the standard deviation of motor noise but would not be expected to actually match each data point so would have a traditional R<sup>2</sup> of O.

      To provide an overall goodness of fit we have to reduce the noise component and to do so we exam ined the traditional R<sup>2</sup> between the average of all the children's data and the average simulation of the model (from the median of 1000 simulations per participant) so as to reduce the stochastic variation. The results for the continuous probabilistic and discrete probabilistic task are R<sup>2</sup> of 0.41 and 0.72, respectively.

      Not that variability in the "success clamp" doe not change across ages (Figure 4C) and does not contribute to the learning effect (Figure 4F). However, it is regarded as reflecting motor noise (Figure SC), which then decreases over age from the model fitting (Figure 5B). How do we reconcile these contradictions? Again, this calls the model fitting into question.

      For the success clamp, we only have 9 trials to calculate variability which limits our power to detect significance with age. In contrast, the model uses all 120 trials to estimate motor noise. There is a downward trend with age in the behavioral data which we now show overlaid on the fits of the model for both probabilistic conditions (Figure 5—figure Supplement 4) and Figure 6—figure Supplement 4). These show a reasonable match and although the variance explained is 1 6 and 56% (we limit to 9 trials so as to match the fail clamp), the correlations are 0.52 and 0.78 suggesting we have reasonable relation although there may be other small sources of variability not captured in the model.

      Figure 5C: it appears one bivariate outlier contributes a lot to the overall significant correlation here for the "success clamp".

      Recalculating after removing that point in original Fig 5C was still significant and we feel the plots mentioned in the previous point add useful information to this issue. With the new model this figure has changed.

      It is still a concern that the young children did not understand the instructions. Nine 3-to-8 children (out of 48) were better explained by the noisy only model than the full model. In contrast, ten of the rest of the participants (out of 98) were better explained by the noisy-only model. It appears that there is a higher percentage of the "young" children who didn't get the instruction than the older ones.

      Thank you for this comment. We did take participant comprehension of the task into consideration during the task design. We specifically designed it so that the instructions were simple and straight forward. The child simply needs to understand the underlying goal to make the video clip play as often as possible and that they must move the penguin to certain positions to get it to play. By having a very simple task goal, we are able to test a naturalistic response to reinforcement in the absence of an explicit strategy in a task suited even for young children.

      We used the updated reinforcement learning model to assess whether an individual's performance is consistent with understanding the task. In the case of a child who does not understand the task, we expect that they simply have motor noise on their reach, and crucially, that they would not explore more after failure, nor update their reach after success. Therefore, we used a likelihood ratio test to examine whether the preferred model was significantly better at explaining each participant's data compared to the model variant which had only motor noise (Model 1). Focusing on only the youngest children (age 3-5), this analysis showed that that 43, 59, 65 and 86% of children (out of N = 21, 22, 20 and 21 ) for the continuous probabilistic, discrete probabilistic, continuous deterministic, and discrete deterministic conditions, respectively, were better fit with the preferred model, indicating non-zero exploration after failure. In the 3-5 year old group for the discrete deterministic condition, 18 out of 21 had performance better fit by the preferred model, suggesting this age group understands the basic task of moving in different directions to find a rewarding location.

      The reduced numbers fit by the preferred model for the other conditions likely reflects differences in the task conditions (continuous and/or probabilistic) rather than a lack of understanding of the goal of the task. We include this analysis as a new subsection at the end of the Results.

      Supplementary Figure 2: the first panel should belong to a 3-year-old not a 5-year-old? How are these panels organized? This is kind of confusing.

      Thank you for this comment. Figure 2—figure Supplement 1 and Figure 2—figure Supplement 2 are arranged with devices in the columns and a sample from each age bin in the rows. For example in Figure 2—figure Supplement 1, column 1, row 1 is a mouse using participant age 3 to 5 years old while column 3, row 2 is a touch screen using participant age 6 to 8 years old. We have edited the labeling on both figures to make the arrangement of the data more clear.

      Line 222: make this a complete sentence.

      This sentence has been edited to a complete sentence.

      Line 331: grammar.

      This sentence has been edited for grammar.

    1. Author response:

      [The following is the authors’ response to the original reviews.]

      We extend our sincere thanks to the editor, referees for eLife, and other commentators who have written evaluations of this manuscript, either in whole or in part. Sources of these comments were highly varied, including within the bioRxiv preprint server, social media (including many comments received on X/Twitter and some YouTube presentations and interviews), comments made by colleagues to journalists, and also some reviews of the work published in other academic journals. Some of these are formal and referenced with citations. Others were informal but nonetheless expressed perspectives that helped enable us to revise the manuscript with the inclusion of broader perspectives than the formal review process. It is beyond the scope of this summary to list every one of these, which have often been brought to the attention of different coauthors, but we begin by acknowledging the very wide array of peer and public commentary that have contributed to this work. The reaction speaks to a broad interest in open discussion and review of preprints. 

      As we compiled this summary of changes to the manuscript, we recognized that many colleagues made comments about the process of preprint dissemination and evaluation rather than the data or analyses in the manuscript. Addressing such comments is outside the scope of this revised manuscript, but we do feel that a broader discussion of these comments would be valuable in another venue. Many commentators have expressed confusion about the eLife system of evaluation of preprints, which differs from the editorial acceptance or rejection practiced in most academic journals. As authors in many different nations, in varied fields, and in varied career stages, we ourselves are still working to understand how the academic publication landscape is changing, and how best to prepare work for new models of evaluation and dissemination. 

      The manuscript and coauthor list reflect an interdisciplinary collaboration. Analyses presented in the manuscript come from a wide range of scientific disciplines. These range from skeletal inventory, morphology, and description, spatial taphonomy, analysis of bone fracture patterns and bone surface modifications, sedimentology, geochemistry, and traditional survey and mapping. The manuscript additionally draws upon a large number of previous studies of the Rising Star cave system and the Dinaledi Subsystem, which have shaped our current work. No analysis within any one area of research stands alone within this body of work: all are interpreted in conjunction with the outcomes of other analyses and data from other areas of research. Any single analysis in isolation might be consistent with many different hypotheses for the formation of sediments and disposition of the skeletal remains. But testing a hypothesis requires considering all data in combination and not leaving out data that do not fit the hypothesis. We highlight this general principle at the outset because a number of the comments from referees and outside specialists have presented alternative hypotheses that may arguably be consistent with one kind of analysis that we have presented, while seeming to overlook other analyses, data, or previous work that exclude these alternatives. In our revision, we have expanded all sections describing results to consider not only the results of each analysis, but how the combination of data from different kinds of analysis relate to hypotheses for the deposition and subsequent history of the Homo naledi remains. We address some specific examples and how we have responded to these in our summary of changes below. 

      General organization:

      The referee and editor comments are mostly general and not line-by-line questions, and we have compiled them and treated them as a group in this summary of changes, except where specifically noted. 

      The editorial comments on the previous version included the suggestion that the manuscript should be reorganized to test “natural” (i.e. noncultural) hypotheses for the situations that we examine. The editorial comment suggested this as a “null hypothesis” testing approach. Some outside comments also viewed noncultural deposition as a null hypothesis to be rejected. We do not concur that noncultural processes should be construed as a null hypothesis, as we discuss further below. However, because of the clear editorial opinion we elected to revise the manuscript to make more explicit how the data and analyses test noncultural depositional hypotheses first, followed by testing of cultural hypotheses. This reorganization means that the revised manuscript now examines each hypothesis separately in turn. 

      Taking this approach resulted in a substantial reorganization of the “Results” section of the manuscript. The “Results” section now begins with summaries of analyses and data conducted on material from each excavation area. After the presentation of data and analyses from each area, we then present a separate section for each of several hypotheses for the disposition and sedimentary context of the remains. These hypotheses include deposition of bodies upon a talus (as hypothesized in some previous work), slow sedimentary burial on a cave floor or within a natural depression, rapid burial by gravity-driven slumping, and burial of naturally mummified remains. We then include sections to test the hypothesis of primary cultural burial and secondary cultural burial. This approach adds substantial length to the Results. While some elements may be repeated across sections, we do consider the new version to be easier to take piece by piece for a reader trying to understand how each hypothesis relates to the evidence. 

      The Results section includes analyses on several different excavation areas within the Dinaledi Subsystem. Each of these presents somewhat different patterns of data. We conceived of this manuscript combining these distinct areas because each of them provides information about the formation history of the Homo naledi-associated sediments and the deposition of the Homo naledi remains. Together they speak more strongly than separately. In the previous version of the manuscript, two areas of excavation were considered in detail (Dinaledi Feature 1 and the Hill Antechamber Feature), with a third area (the Puzzle Box area) included only in the Discussion and with reference to prior work. We now describe the new work undertaken after the 2013-2014 excavations in more detail. This includes an overview of areas in the Hill Antechamber and Dinaledi Chamber that have not yielded substantial H. naledi remains and that thereby help contextualize the spatial concentration of H. naledi skeletal material. The most substantial change in the data presented is a much expanded reanalysis of the Puzzle Box area. This reanalysis provides greater clarity on how previously published descriptions relate to the new evidence. The reanalysis also provides the data to integrate the detailed information on bone identification fragmentation, and spatial taphonomy from this area with the new excavation results from the other areas. 

      In addition to Results, the reorganization also affected the manuscript’s Introduction section. Where the previous version led directly from a brief review of Pleistocene burial into the description of the results, this revised manuscript now includes a review of previous studies of the Rising Star cave system. This review directly addresses referee comments that express some hesitation to accept previous results concerning the structure and formation of sediments, the accessibility of the Dinaledi Subsystem, the geochronological setting of the H. naledi remains, and the relation of the Dinaledi Subsystem to nearby cave areas. Some parts of this overview are further expanded in the Supplementary Information to enable readers to dive more deeply into the previous literature on the site formation and geological configuration of the Rising Star cave system without needing to digest the entirety of the cited sources. 

      The Discussion section of the revised manuscript is differentiated from Results and focuses on several areas where the evidence presented in this study may benefit from greater context. One new section addresses hypothesis testing and parsimony for Pleistocene burial evidence, which we address at greater length in this summary below. The majority of the Discussion concerns the criteria for recognizing evidence for burial as applied in other studies. In this research we employ a minimal definition but other researchers have applied varied criteria. We consider whether these other criteria have relevance in light of our observations and whether they are essential to the recognition of burial evidence more broadly. 

      Vocabulary:

      We introduce the term “cultural burial” in this revised manuscript to refer to the burial of dead bodies as a mortuary practice. “Burial” as an unmodified term may refer to the passive covering of remains by sedimentary processes. Use of the term “intentional burial” would raise the question of interpreting intent, which we do not presume based on the evidence presented in this research. The relevant question in this case is whether the process of burial reflects repeated behavior by a group. As we received input from various colleagues it became clear that burial itself is a highly loaded term. In particular there is a common assumption within the literature and among professionals that burial must by definition be symbolic. We do not take any position on that question in this manuscript, and it is our hope that the term “cultural burial” may focus the conversation around the extent that the behavioral evidence is repeated and patterned. 

      Sedimentology and geochemistry of Dinaledi Feature 1:

      Reviewer 4 provided detailed comments on the sedimentological and geochemical context that we report in the manuscript. One outside review (Foecke et al. 2024) included some of the points raised by reviewer 4, and additionally addressed the reporting of geochemical and sedimentological data in previous work that we cite. 

      To address these comments we have revised the sedimentary context and micromorphology of sediments associated with Dinaledi Feature 1. In the new text we demonstrate the lack of microstratigraphy (supported by grain size analysis) in the unlithified mud clast breccia (UMCB), while such a microstratigraphy is observed in the laminated orange-red mudstones (LORM) that contribute clasts to the UMCB. Thus, we emphasize the presence and importance of a laterally continuous layer of LORM nature occurring at a level that appears to be the maximum depth of fossil occurrence. This layer is severely broken under extensive accumulation of fossils such as Feature 1 and only evidenced by abundant LORM clasts within and around the fossils. 

      We have completely reworked the geochemical context associated with Feature 1 following the comments of reviewer 4. We described the variations and trends observed in the major oxides separate from trace and rare-earth elements. We used Harker variations plots to assess relationships between these element groups with CaO and Zn, followed by principal component analysis of all elements analyzed. The new geochemical analysis clearly shows that Feature 1 is associated with localized trace element signatures that exist in the sediments only in association with the fossil bones, which suggests lack of postdepositional mobilization of the fossils and sediments. We additionally have included a fuller description of XRF methods. 

      To clarify the relation of all results to the features described in this study, we removed the geochemical and sedimentological samples from other sites within the Dinaledi Subsystem. These localities within the fissure network represent only surface collection of sediment, as no excavation results are available from those sites to allow for comparison in the context of assessing evidence of burial. These were initially included for comparison, but have now been removed to avoid confusion.  

      Micromorphology of sediments:

      Some referees (1, 3, and 4) and other commentators (including Martinón-Torres et al. 2024) have suggested that the previous manuscript was deficient due to an insufficient inclusion of micromorphological analysis of sediments. Because these commentators have emphasized this kind of evidence as particularly important, we review here what we have included and how our revision has addressed this comment. Previous work in the Dinaledi Chamber (Dirks et al., 2015; 2017) included thin section illustrations and analysis of sediment facies, including sediments in direct association with H. naledi remains within the Puzzle Box area. The previous work by Wiersma and coworkers (2020) used micromorphological analysis as one of several approaches to test the formation history of Unit 3 sediments in the Dinaledi Subsystem, leading to the interpretation of autobrecciation of earlier Unit 1 sediment. In the previous version of this manuscript we provided citations to this earlier work. The previous manuscript also provided new thin section illustrations of Unit 3 sediment near Dinaledi Feature 1 to place the disrupted layer of orange sediment (now designated the laminated orange silty mudstone unit) into context. 

      In the new revised manuscript we have added to this information in three ways. First, as noted above in response to reviewer 4, we have revised and added to our discussion of micromorphology within and adjacent to the Dinaledi Feature 1. Second, we have included more discussion in the Supplementary Information of previous descriptions of sediment facies and associated thin section analysis, with illustrations from prior work (CC-BY licensed) brought into this paper as supplementary figures, so that readers can examine these without following the citations. Third, we have included Figure 10 in the manuscript which includes six panels with microtomographic sections from the Hill Antechamber Feature. This figure illustrates the consistency of sub-unit 3b sediment in direct contact with H. naledi skeletal material, including anatomically associated skeletal elements, with previous analyses that demonstrate the angular outlines and chaotic orientations of LORM clasts. It also shows density contrasts of sediment in immediate contact with some skeletal elements, the loose texture of this sediment with air-filled voids, and apparent invertebrate burrowing activity. To our knowledge this is the first application of microtomography to sediment structure in association with a Pleistocene burial feature. 

      To forestall possible comments that the revised manuscript does not sufficiently employ micromorphological observations, or that any one particular approach to micromorphology is the standard, we present here some context from related studies of evidence from other research groups working at varied sites in Africa, Europe, and Asia. Hodgkins et al. (2021) noted: “Only a handful of micromorphological studies have been conducted on human burials and even fewer have been conducted on suspected burials from Paleolithic or hunter-gatherer contexts.” In that study, one supplementary figure with four photomicrographs of thin sections of sediments was presented. Interpretation of the evidence for a burial pit by Hodgkins et al. (2021) noted the more open microstructure of sediment but otherwise did not rely upon the thin section data in characterizing the sediments associated with grave fill. Martinón-Torres et al. (2021) included one Extended Data figure illustrating thin sections of sediments and bone, with two panels showing sediments (the remainder showing bone histology). The micromorphological analysis presented in the supplementary information of that paper was restricted to description of two microfacies associated with the proposed “pit” in that study. That study did carry out microCT scanning of the partially-prepared skeletal remains but did not report any sediment analysis from the microtomographic results. Maloney et al. (2022) reported no micromorphological or thin section analysis. Pomeroy et al. (2020a) included one illustration of a thin section; this study may be regarded as a preliminary account rather than a full description of the work undertaken. Goldberg et al. (2017) analyzed the geoarchaeology of the Roc de Marsal deposits in which possible burial-associated sediments had been fully excavated in the 1960s, providing new morphological assessments of sediment facies; the supplementary information to this work included five scans (not microscans) of sediment thin sections and no microphotographs. Fewlass et al. (2023) presented no thin section or micromorphological illustrations or methods. In summary of this research, we note that in one case micromorphological study provided observations that contributed to the evidence for a pit, in other cases micromorphological data did not test this hypothesis, and many researchers do not apply micromorphological techniques in their particular contexts. 

      Sediment micromorphology is a growing area of research and may have much to provide to the understanding of ancient burial evidence as its standards continue to develop (Pomeroy et al. 2020b). In particular microtomographic analysis of sediments, as we have initiated in this study, may open new horizons that are not possible with more destructive thin-section preparation. In this manuscript, the thin section data reveals valuable evidence about the disruption of sediment structure by features within the Dinaledi Chamber, and microtomographic analysis further documents that the Hill Antechamber Feature reflects similar processes, in addition to possible post-burial diagenesis and invertebrate activity. Following up in detail on these processes will require further analysis outside the scope of this manuscript. 

      Access into the Dinaledi Subsystem:

      Reviewer 1 emphasizes the difficulty of access into the Dinaledi Subsystem as a reason why the burial hypothesis is not parsimonious. Similar comments have been made by several outside commentators who question whether past accessibility into the Dinaledi Subsystem may at one time have been substantially different from the situation documented in previous work. Several pieces of evidence are relevant to these questions and we have included some discussion of them in the Introduction, and additionally include a section in the Supplementary Information (“Entrances to the cave system”) to provide additional context for these questions. Homo naledi remains are found not only within the Dinaledi Subsystem but also in other parts of the cave system including the Lesedi Chamber, which is similarly difficult for non-expert cavers to access. The body plan, mass, and specific morphology of H. naledi suggest that this species would be vastly more suited to moving and climbing within narrow underground passages than living people. On this basis it is not unparsimonious to suggest that the evidence resulted from H. naledi activity within these spaces. We note that the accessibility of the subsystem is not strictly relevant to the hypothesis of cultural burial, although the location of the remains does inform the overall context which may reflect a selection of a location perceived as special in some way. 

      Stuffing bodies down the entry to the subsystem:

      Reviewer 3 suggests that one explanation for the emplacement of articulated remains at the top of the sloping floor of the Hill Antechamber is that bodies were “stuffed” into the chute that comprises the entry point of the subsystem and passively buried by additional accumulation of remains. This was one hypothesis presented in earlier work (Dirks et al. 2015) and considered there as a minimal explanation because it did not entail the entry of H. naledi individuals into the subsystem. The further exploration (Elliott et al. 2021) and ongoing survey work, as well as this manuscript, all have resulted in data that rejects this hypothesis. The revised manuscript includes a section in the results “Deposition upon a talus with passive burial” that examines this hypothesis in light of the data. 

      Recognition of pits:

      Referee 3 and 4 and several additional commentators have emphasized that the recognition of pit features is necessary to the hypothesis of burial, and questioned whether the data presented in the manuscript were sufficient to demonstrate that pits were present. We have revised the manuscript in several ways to clarify how all the different kinds of evidence from the subsystem test the hypothesis that pits were present. This includes the presentation of a minimal definition of burial to include a pit dug by hominins, criteria for recognizing that a pit was present, and an evaluation of the evidence in each case to make clear how the evidence relates to the presence of a pit and subsequent infill. As referee 3 notes, it can be challenging to recognize a pit when sediment is relatively homogeneous. This point was emphasized in the review by Pomeroy and coworkers (2020b), who reflected on the difficulty seeing evidence for shallow pits constructed by hominins, and we have cited this in the main text. As a result, the evidence for pits has been a recurrent topic of debate for most Pleistocene burial sites. However in addition to the sedimentological and contextual evidence in the cases we describe, the current version also reflects upon other possible mechanisms for the accumulation of bones or bodies. The data show that the sedimentary fill associated with the H. naledi remains in the cases we examine could not have passively accumulated slowly and is not indicative of mass movement by slumping or other high-energy flow. To further put these results into context, we added a section to the Discussion that briefly reviews prior work on distinguishing pits in Pleistocene burial contexts, including the substantial number of sites with accepted burial evidence for which no evidence of a pit is present. 

      Extent of articulation and anatomical association:

      We have added significantly greater detail to the descriptions of articulated remains and orientation of remains in order to describe more specifically the configuration of the skeletal material. We also provide 14 figures in main text (13 of them new) to illustrate the configuration of skeletal remains in our data. For the Puzzle Box area, this now includes substantial evidence on the individuation of skeletal fragments, which enables us to illustrate the spatial configuration of remains associated with the DH7 partial skeleton, as well as the spatial position of fragments refitted as part of the DH1, DH2, DH3, and DH4 crania. For Dinaledi Feature 1 and the Hill Antechamber Feature we now provide figures that key skeletal parts as identified, including material that is unexcavated where possible, and a skeletal part representation figure for elements excavated from Dinaledi Feature 1. 

      Archaeothanatology:

      Reviewer 2 suggests that a greater focus on the archaeothanatology literature would be helpful to the analysis, with specific reference to the sequence of joint disarticulation, the collapse of sediment and remains into voids created by decomposition, and associated fragmentation of the remains. In the revised manuscript we have provided additional analysis of the Hill Antechamber Feature with this approach in mind. This includes greater detail and illustration of our current hypothesis for individuation of elements. We now discuss a hypothesis of body disposition, describe the persistent joints and articulation of elements, and examine likely decomposition scenarios associated with these remains. Additionally, we expand our description and illustration of the orientation of remains and degree of anatomical association and articulation within Dinaledi Feature 1. For this feature and for the Hill Antechamber Feature we have revised the text to describe how fracturing and crushing patterns are consistent with downward pressure from overlying sediment and material. In these features, postdepositional fracturing occurred subsequent to the decomposition of soft tissue and partial loss of organic integrity of the bone. We also indicate that the loss by postdepositional processes of most long bone epiphyses, vertebral bodies, and other portions of the skeleton less rich in cortical bone, poses a challenge for testing the anatomical associations of the remaining elements. This is a primary reason why we have taken a conservative approach to identification of elements and possible associations. 

      A further aspect of the site revealed by our analysis is the selective reworking of sediments within the Puzzle Box area subsequent to the primary deposition of some bodies. The skeletal evidence from this area includes body parts with elements in anatomical association or articulation, juxtaposed closely with bone fragments at varied pitch and orientation. This complexity of events evidenced within this area is a challenge for approaches that have been developed primarily based on comparative data from single-burial situations. In these discussions we deepen our use of references as suggested by the referee.   

      Burial positions:

      Reviewer 2 further suggests that illustrations of hypothesized burial positions would be valuable. We recognize that a hypothesized burial position may be an appealing illustration, and that some recent studies have created such illustrations in the context of their scientific articles. However such illustrations generally include a great deal of speculation and artist imagination, and tend to have an emotive character. We have added more discussion to the manuscript of possible primary disposition in the case of the Hill Antechamber Feature as discussed above. We have not created new illustrations of hypothesized burial positions for this revision. 

      Carnivore involvement:

      Referee 1 suggests that the manuscript should provide further consideration of whether carnivore activity may have introduced bones or bodies into the cave system. The reorganized Introduction now includes a review of previous work, and an expanded discussion within the Supplementary Information (“Hypotheses tested in previous work”). This includes a review of literature on the topic of carnivore accumulation and the evidence from the Dinaledi and Lesedi Chamber that rejects this hypothesis. 

      Water transport and mud:

      The eLife referees broadly accepted previous work showing that water inundation or mass flow of water-saturated sediment did not occur within the history of Unit 2 and 3 sediments, including those associated with H. naledi remains. However several outside commentators did refer specifically to water flow or mud flow as a mechanism for slumping of deposits and possible sedimentary covering of the remains. To address these comments we have added a section to the

      Supplementary Information (“Description of the sedimentary deposits of the Dinaledi Subsystem”) that reviews previous work on the sedimentary units and formation processes documented in this area. We also include a subsection specifically discussing the term “mud” as used in the description of the sedimentology within the system, as this term has clearly been confusing for nonspecialists who have read and commented on the work. We appreciate the referees’ attention to the previous work and its terminology.  

      Redescription of areas of the cave system:

      Reviewer 1 suggests that a detailed reanalysis of all portions of the cave system in and around the Dinaledi Subsystem is warranted to reject the hypothesis that bodies entered the space passively and were scattered from the floor by natural (i.e. noncultural) processes. The referee suggests that National Geographic could help us with these efforts. To address this comment we have made several changes to the manuscript. As noted above, we have added material in Supplementary Information to review the geochronology of the Dinaledi Subsystem and nearby Dragon’s Back Chamber, together with a discussion of the connections between these spaces. 

      Most directly in response to this comment we provide additional documentation of the possibility of movement of bodies or body parts by gravity within the subsystem itself. This includes detailed floor maps based on photogrammetry and LIDAR measurement, where these are physically possible, presented in Figures 2 and 3. In some parts of the subsystem the necessary equipment cannot be used due to the extremely confined spaces, and for these areas our maps are based on traditional survey methods. In addition to plan maps we have included a figure showing the elevation of the subsystem floor in a cross-section that includes key excavation areas, showing their relative elevation. All figures that illustrate excavation areas are now keyed to their location with reference to a subsystem plan. These data have been provided in previous publications but the visualization in the revised manuscript should make the relationship of areas clear for readers. The Introduction now includes text that discusses the configuration of the Hill Antechamber, Dinaledi Chamber, and nearby areas, and also discusses the instances in which gravity-driven movement may be possible, at the same time reviewing that gravity-driven movement from the entry point of the subsystem to most of the localities with hominin skeletal remains is not possible. 

      Within the Results, we have added a section on the relationship of features to their surroundings in order to assist readers in understanding the context of these bone-bearing areas and the evidence this context brings to the hypothesis in question. We have also included within this new section a discussion of the discrete nature of these features, a question that has been raised by outside commentators. 

      Passive sedimentation upon a cave floor or within a natural depression:

      Reviewer 3 suggests that the situation in the Dinaledi Subsystem may be similar to a European cave where a cave bear skeleton might remain articulated on a cave floor (or we can add, within a hollow for hibernation), later to be covered in sediment. The reviewer suggests that articulation is therefore no evidence of burial, and suggests that further documentation of disarticulation processes is essential to demonstrating the processes that buried the remains. We concur that articulation by itself is not sufficient evidence of cultural burial. To address this comment we have included a section in the Results that tests the hypothesis that bodies were exposed upon the cave floor or within a natural depression. To a considerable degree, additional data about disarticulation processes subsequent to deposition are provided in our reanalysis of the Puzzle Box area, including evidence for selective reworking of material after burial. 

      Postdepositional movement and floor drains:

      Reviewer 3 notes that previous work has suggested that subsurface floor drains may have caused some postdepositional movement of skeletal remains. The hypothesis of postdepositional slumping or downslope movement has also been discussed by some external commentators (including Martinón-Torres et al. 2024). We have addressed this question in several places within the revised manuscript. As we now review, previous discussion of floor drains attempted to explain the subvertical orientation of many skeletal elements excavated from the Puzzle Box area. The arrangement of these bones reflects reworking as described in our previous work, and without considering the possibility of reworking by hominins, one mechanism that conceivably might cause reworking was downward movement of sediments into subsurface drains. Further exploration and mapping, combined with additional excavation into the sediments beneath the Puzzle Box area provided more information relevant to this hypothesis. In particular this evidence shows that subsurface drains cannot explain the arrangement of skeletal material observed within the Puzzle Box area. As now discussed in the text, the reworking is selective and initiated from above rather than below. This is best explained by hominin activity subsequent to burial. 

      In a new section of the Results we discuss slumping as a hypothesis for the deposition of the remains. This includes discussion of downslope movement within the Hill Antechamber and the idea that floor drains may have been a mechanism for sediment reworking in and around the Puzzle Box area and Dinaledi Feature 1. As described in this section the evidence does not support these hypotheses. 

      Hypothesis testing and parsimony:

      Referees 1 and 3 and the editorial guidance all suggested that a more appropriate presentation would adopt a null hypothesis and test it. The specific suggestion that the null hypothesis should be a natural sedimentary process of deposition was provided not only by these reviewers but also by some outside commentators. To address this comment, we have edited the manuscript in two ways. The first is the addition of a section to the Discussion that specifically discusses hypothesis testing and parsimony as related to Pleistocene evidence of cultural burial. This includes a brief synopsis of recent disciplinary conversations and citation of work by other groups of authors, none of whom adopted this “null hypothesis” approach in their published work. 

      As we now describe in the manuscript, previous work on the Dinaledi evidence never assumed any role for H. naledi in the burial of remains. Reading the reviewer reports caused us to realize that this previous work had followed exactly the “null hypothesis” approach that some suggested we follow. By following this null hypothesis approach, we neglected a valuable avenue of investigation. In retrospect, we see how this approach impeded us from understanding the pattern of evidence within the Puzzle Box area. Thus in the revised manuscript we have mentioned this history within the Discussion and also presented more of the background to our previous work in the Introduction. Hopefully by including this discussion of these issues, the manuscript will broaden conversation about the relation of parsimony to these issues. 

      Language and presentation style:

      Reviewer 4 criticizes our presentation, suggesting that the text “gives the impression that a hypothesis was formulated before data were collected.” Other outside commentators have mentioned this notion also, including Martinón-Torres et al. (2024) who suggest that the study began from a preferred hypothesis and gathered data to support it. The accurate communication of results and hypotheses in a scientific article is a broader issue than this one study. Preferences about presentation style vary across fields of study as well as across languages. We do not regret using plain language where possible. In any study that combines data and methods from different scientific disciplines, the use of plain language is particularly important to avoid misunderstandings where terms may mean different things in different fields. 

      The essential question raised by these comments is whether it is appropriate to present the results of a study in terms of the hypothesis that is best supported. As noted above, we read carefully many recent studies of Pleistocene burial evidence. We note that in each of these studies that concluded that burial is the best hypothesis, the authors framed their results in the same way as our previous manuscript: an introduction that briefly reviews background evidence for treatment of the dead, a presentation of results focused on how each analysis supports the hypothesis of burial for the case, and then in some (but not all) cases discussion of why some alternative hypotheses could be rejected. We do not infer from this that these other studies started from a presupposition and collected data only to confirm it. Rather, this is a simple matter of presentation style. 

      The alternative to this approach is to present an exhaustive list of possible hypotheses and to describe how the data relate to each of them, at the end selecting the best. This is the approach that we have followed in the revised manuscript, as described above under the direction of the reviewer and editorial guidance. This approach has the advantage of bringing together evidence in different combinations to show how each data point rejects some hypotheses while supporting others. It has the disadvantage of length and repetition. 

      Possible artifact:

      We have chosen to keep the description of the possible artifact associated with the Hill Antechamber Feature in the Supplementary Information. We do this while acknowledging that this is against the opinion of reviewer 4, who felt the description should be removed unless the object in question is fully excavated and physically analyzed. The previous version of the manuscript did not rely upon the stone as positive evidence of grave goods or symbolic content, and it noted that the data do not test whether the possible artifact was placed or was intentionally modified. However this did not satisfy reviewer 4, and some outside commentators likewise asserted that the object must be a “geofact” and that it should be removed. 

      We have three arguments against this line of thinking. First, we do not omit data from our reporting. Whether Homo naledi shaped the rock or not, used it as a tool or not, whether the rock was placed with the body or not, it is unquestionably there. Omitting this one object from the report would be simply dishonest. Second, the data on this rock are at 16 micron resolution. While physical inspection of its surface may eventually reveal trace evidence and will enable better characterization of the raw material, no mode of surface scanning will produce better evidence about the object’s shape. Third, the position of this possible artifact within the feature provides significant information about the deposition of the skeletal material and associated sediments. The pitch, orientation, and position of the stone is not consistent with slow deposition but are consistent with the hypothesis that the surrounding sediment was rapidly emplaced at the same time as the articulated elements less than 2 cm away. 

      In the current version, we have redoubled our efforts to provide information about the position and shape of this stone while not presupposing the intentionality of its shape or placement. We add here that the attitude expressed by referee 4 and other commentators, if followed at other sites, would certainly lead to the loss or underreporting of evidence, which we are trying to avoid.  

      Consistency versus variability of behavior:

      As described in the revised manuscript, different features within the Dinaledi Subsystem exhibit some shared characteristics. At the same time, they vary in positioning, representation of individuals and extent of commingling. Other localities within the subsystem and broader cave system present different evidence. Some commentators have questioned whether the patterning is consistent with a single common explanation, or whether multiple explanations are necessary. To address this line of questioning, we have added several elements to the manuscript. We created a new section on secondary cultural burial, discussing whether any of the situations may reflect this practice. In the Discussion, we briefly review the ways in which the different features support the involvement of H. naledi without interpreting anything about the intentionality or meaning of the behavior. We further added a section to the Discussion to consider whether variation among the features reflects variation in mortuary practices by H. naledi. One aspect of this section briefly cites variation in the location and treatment of skeletal remains at other sites with evidence of burial. 

      Grave goods:

      Some commentators have argued that grave goods are a necessary criterion for recognizing evidence of ancient burial. We added a section to the Discussion to review evidence of grave goods at other Pleistocene sites where burial is accepted. 

      References:

      • Dirks, P. H., Berger, L. R., Roberts, E. M., Kramers, J. D., Hawks, J., Randolph-Quinney, P. S., Elliott, M., Musiba, C. M., Churchill, S. E., de Ruiter, D. J., Schmid, P., Backwell, L. R., Belyanin, G. A., Boshoff, P., Hunter, K. L., Feuerriegel, E. M., Gurtov, A., Harrison, J. du G., Hunter, R., … Tucker, S. (2015). Geological and taphonomic context for the new hominin species Homo naledi from the Dinaledi Chamber, South Africa. eLife, 4, e09561. https://doi.org/10.7554/eLife.09561

      • Dirks, P. H., Roberts, E. M., Hilbert-Wolf, H., Kramers, J. D., Hawks, J., Dosseto, A., Duval, M., Elliott, M., Evans, M., Grün, R., Hellstrom, J., Herries, A. I., Joannes-Boyau, R., Makhubela, T. V., Placzek, C. J., Robbins, J., Spandler, C., Wiersma, J., Woodhead, J., & Berger, L. R. (2017). The age of Homo naledi and associated sediments in the Rising Star Cave, South Africa. eLife, 6, e24231. https://doi.org/10.7554/eLife.24231

      • Elliott, M., Makhubela, T., Brophy, J., Churchill, S., Peixotto, B., FEUERRIEGEL, E., Morris, H., Van Rooyen, D., Ramalepa, M., Tsikoane, M., Kruger, A., Spandler, C., Kramers, J., Roberts, E., Dirks, P., Hawks, J., & Berger, L. R. (2021). Expanded Explorations of the Dinaledi Subsystem,Rising Star Cave System, South Africa. PaleoAnthropology, 2021(1), 15–22. https://doi.org/10.48738/2021.iss1.68

      • Fewlass, H., Zavala, E. I., Fagault, Y., Tuna, T., Bard, E., Hublin, J.-J., Hajdinjak, M., & Wilczyński, J. (2023). Chronological and genetic analysis of an Upper Palaeolithic female infant burial from Borsuka Cave, Poland. iScience, 26(12). https://doi.org/10.1016/j.isci.2023.108283

      • Foecke, Kimberly K., Queffelec, Alain, & Pickering, Robyn. (n.d.). No Sedimentological Evidence for Deliberate Burial by Homo naledi – A Case Study Highlighting the Need for Best Practices in Geochemical Studies Within Archaeology and Paleoanthropology. PaleoAnthropology, 2024. https://doi.org/10.48738/202x.issx.xxx

      • Goldberg, P., Aldeias, V., Dibble, H., McPherron, S., Sandgathe, D., & Turq, A. (2017). Testing the Roc de Marsal Neandertal “Burial” with Geoarchaeology. Archaeological and Anthropological Sciences, 9(6), 1005–1015. https://doi.org/10.1007/s12520-013-0163-2

      • Maloney, T. R., Dilkes-Hall, I. E., Vlok, M., Oktaviana, A. A., Setiawan, P., Priyatno, A. A. D., Ririmasse, M., Geria, I. M., Effendy, M. A. R., Istiawan, B., Atmoko, F. T., Adhityatama, S., Moffat, I., Joannes-Boyau, R., Brumm, A., & Aubert, M. (2022). Surgical amputation of a limb 31,000 years ago in Borneo. Nature, 609(7927), 547–551. https://doi.org/10.1038/s41586-022-05160-8

      • Martinón-Torres, M., d’Errico, F., Santos, E., Álvaro Gallo, A., Amano, N., Archer, W., Armitage, S. J., Arsuaga, J. L., Bermúdez de Castro, J. M., Blinkhorn, J., Crowther, A., Douka, K., Dubernet, S., Faulkner, P., Fernández-Colón, P., Kourampas, N., González García, J., Larreina, D., Le Bourdonnec, F.-X., … Petraglia, M. D. (2021). Earliest known human burial in Africa. Nature, 593(7857), Article 7857. https://doi.org/10.1038/s41586021-03457-8

      • Martinón-Torres, M., Garate, D., Herries, A. I. R., & Petraglia, M. D. (2023). No scientific evidence that Homo naledi buried their dead and produced rock art. Journal of Human Evolution, 103464. https://doi.org/10.1016/j.jhevol.2023.103464

      • Pomeroy, E., Bennett, P., Hunt, C. O., Reynolds, T., Farr, L., Frouin, M., Holman, J., Lane, R., French, C., & Barker, G. (2020a). New Neanderthal remains associated with the ‘flower burial’ at Shanidar Cave. Antiquity, 94(373), 11–26. https://doi.org/10.15184/aqy.2019.207

      • Pomeroy, E., Hunt, C. O., Reynolds, T., Abdulmutalb, D., Asouti, E., Bennett, P., Bosch, M., Burke, A., Farr, L., Foley, R., French, C., Frumkin, A., Goldberg, P., Hill, E., Kabukcu, C., Lahr, M. M., Lane, R., Marean, C., Maureille, B., … Barker, G. (2020b). Issues of theory and method in the analysis of Paleolithic mortuary behavior: A view from Shanidar Cave. Evolutionary Anthropology: Issues, News, and Reviews, 29(5), 263–279. https://doi.org/10.1002/evan.21854

      • Robbins, J. L., Dirks, P. H. G. M., Roberts, E. M., Kramers, J. D., Makhubela, T. V., HilbertWolf, H. L., Elliott, M., Wiersma, J. P., Placzek, C. J., Evans, M., & Berger, L. R. (2021). Providing context to the Homo naledi fossils: Constraints from flowstones on the age of sediment deposits in Rising Star Cave, South Africa. Chemical Geology, 567, 120108. https://doi.org/10.1016/j.chemgeo.2021.120108

      • Wiersma, J. P., Roberts, E. M., & Dirks, P. H. G. M. (2020). Formation of mud clast breccias and the process of sedimentary autobrecciation in the hominin-bearing (Homo naledi) Rising Star Cave system, South Africa. Sedimentology, 67(2), 897–919. https://doi.org/10.1111/sed.12666

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work tried to map the synaptic connectivity between the inputs and outputs of the song premotor nucleus, HVC in zebra finches to understand how sensory (auditory) to motor circuits interact to coordinate song production and learning. The authors optimized the optogenetic technique via AAV to manipulate auditory inputs from a specific auditory area one-by-one and recorded synaptic activity from a neuron with whole-cell recording from slice preparation with identification of the projection area by retrograde neuronal tracing. This thorough and detailed analysis provides compelling evidence of synaptic connections between 4 major auditory inputs (3 forebrain and 1 thalamic region) within three projection neurons in the HVC; all areas give monosynaptic excitatory inputs and polysynaptic inhibitory inputs, but proportions of projection to each projection neuron varied. They also find specific reciprocal connections between mMAN and Av. Taken together the authors provide the map of the synaptic connection between intercortical sensory to motor areas which is suggested to be involved in zebra finch song production and learning.

      Strengths:

      The authors optimized optogenetic tools with eGtACR1 by using AAV which allow them to manipulate synaptic inputs in a projection-specific manner in zebra finches. They also identify HVC cell types based on projection area. With their technical advance and thorough experiments, they provided detailed map synaptic connections.

      Weaknesses:

      As it is the study in brain slice, the functional implication of synaptic connectivity is limited. Especially as all the experiments were done in the adult preparation, there could be a gap in discussing the functions of developmental song learning.

      We thank the reviewer for their appreciation of our work. Although we agree that there can be limitations to brain slice preparations, the approaches used here for synaptic connectivity mapping are well-designed to identify long-range synaptic connectivity patterns. Optogenetic stimulation of axon terminals in brain slices does not require intact axons and works well when axons are cut, allowing identification of all inputs expressing optogenetic channels from aXerent regions. Terminal stimulation in slices yields stable post-synaptic responses for hours without rundown, assuring that polysynaptic and monosynaptic connections can be reliably identified in our brain slices.  Additionally, conducting similar types of experiments in vivo can run into important limitations. First, the extent of TTX and 4-AP diXusion, which is necessary for identification of long-range monosynaptic connections, can be diXicult to verify in vivo - potentially confounding identification of monosynaptic connectivity.  Second, conducting whole-cell patch-clamp experiments in vivo, particularly in deeper brain regions, is technically challenging, and would limit the number of cells that can be patched and increase the number of animals needed. 

      We agree that there may well be important diXerences between adult connectivity and connectivity patterns in the juvenile brain. Indeed, learning and experience during development almost certainly shape connectivity patterns and these patterns of connectivity may change incrementally and/or dynamically during development. Ultimately, adult connectivity patterns are the result of changes in the brain that accrue over development. Given that this is the first study mapping long-range connectivity of HVC input-output pathways, we reasoned that the adult connectivity would provide a critical reference allowing future studies to map diXerent stages of juvenile connectivity and the changes in connectivity driven by milestones like forming a tutor song memory, sensorimotor learning, and song crystallization.

      In this revision we worked to better highlight the points raised above and thank the reviewer for their comments.

      Reviewer #2 (Public review):

      Summary:

      The manuscript describes synaptic connectivity in the Songbird cortex's four main classes of sensory neuron aXerents onto three known classes of projection neurons of the pre-motor cortical region HVC. HVC is a region associated with the generation of learned bird songs. Investigators here use all male zebra finches to examine the functional anatomy of this region using patch clamp methods combined with optogenetic activation of select neuronal groups.

      Strengths:

      The quality of the recordings is extremely high and the quantity of data is on a very significant scale, this will certainly aid the field.

      Weaknesses:

      The authors could make the figures a little easier to navigate. Most of the figures use actual anatomical images but it would be nice to have this linked with a zebra finch atlas in more of a cartoon format that accompanied each fluro image. Additionally, for the most part, figures showing the labeling lack scale bar values (in um). These should be added not just shown in the legends.

      The authors could make it clear in the abstract that this is all male zebra finches - perhaps this is obvious given the bird song focus, but it should be stated. The number of recordings from each neuron class and the overall number of birds employed should be clearly stated in the methods (this is in the figures, but it should say n=birds or cells as appropriate).

      The authors should consider sharing the actual electrophysiology records as data.

      We thank the reviewer for their assessment of our research and suggestions. We have implemented many of these suggestions and provide details in our response to their specific Recommendations. Additionally, we are organizing our data and will make it publicly available with the version of record.

      Reviewer #3 (Public review):

      Nucleus HVC is critical both for song production as well as learning and arguably, sitting at the top of the song control system, is the most critical node in this circuit receiving a multitude of inputs and sending precisely timed commands that determine the temporal structure of song. The complexity of this structure and its underlying organization seem to become more apparent with each experimental manipulation, and yet our understanding of the underlying circuit organization remains relatively poorly understood. In this study, Trusel and Roberts use classic whole-cell patch clamp techniques in brain slices coupled with optogenetic stimulation of select inputs to provide a careful characterization and quantification of synaptic inputs into HVC. By identifying individual projection neurons using retrograde tracer injections combined with pharmacological manipulations, they classify monosynaptic inputs onto each of the three main classes of glutamatergic projection neurons in HVC (RA-, Area X- and Av-projecting neurons). This study is remarkable in the amount of information that it generates, and the tremendous labor involved for each experiment, from the expression of opsins in each of the target inputs (Uva, NIf, mMAN, and Av), the retrograde labelling of each type of projection neuron, and ultimately the optical stimulation of infected axons while recording from identified projection neurons. Taken together, this study makes an important contribution to increasing our identification, and ultimately understanding, of the basic synaptic elements that make up the circuit organization of HVC, and how external inputs, which we know to be critical for song production and learning, contribute to the intrinsic computations within this critic circuit.

      This study is impressive in its scope, rigorous in its implementation, and thoughtful regarding its limitations. The manuscript is well-written, and I appreciate the clarity with which the authors use our latest understanding of the evolutionary origins of this circuit to place these studies within a larger context and their relevance to the study of vocal control, including human speech. My comments are minor and primarily about legibility, clarification of certain manipulations, and organization of some of the summary figures.

      We thank the reviewer for their thoughtful assessment of our research.

      Recommendations for the authors:

      The following recommendations were considered by all reviewers to be important to incorporate for improving this paper:

      (1) Clarify the site of viral injection and the possibility of labeling other structures a) Show images of viral injection sites.

      We provide a representative image of viral expression for each pathway studied in this manuscript. Please see panel A in Figures 2-3 and 5-6 showing our viral expression in Uva, NIf, mMAN, and Av respectively.  

      b) Include in discussion caveats that the virus may spread beyond the boundaries of structures (e.g. especially injections into NIF could spread into Field L).

      For each HVC aXerent nucleus we have now included a sentence describing the possible spread of viral infection in surrounding structures in the Results. We also now expanded the image from the Av section to include NIf, to showcase lack of viral expression in NIf (see Fig. 6A).

      (2) Clarify the logic and precise methods of the TTX and 4-AP experiments

      a) Please see the detailed issue raised by Reviewer 3, Major Point 1 below.

      The TTX and 4AP application is the gold-standard of opsin-assisted synaptic circuit interrogation, pioneered by the Svoboda lab in 2009 (Petreanu, Mao et al. 2009) and widely used to assess monosynaptic connectivity in multiple brain circuits, as summarized in a recent review(Linders, Supiot et al. 2022). We now better describe the logic of this approach in the second paragraph of the Results section and cite the first description of this method from the Svoboda lab and a recent review weighing this method with other optogenetic methods for tracing synaptic connections in the brain.

      (3) Include caveats in discussion

      a) Note that there may be other inputs to HVC that were not examined in this study (e.g. CMM, Field L)

      In our original manuscript we did state “Although a complete description of HVC circuitry will require the examination of other potential inputs (i.e. RA<sub>HVC</sub> PNs, A11 glutamatergic neurons(Roberts, Klein et al. 2008, Ben-Tov, Duarte et al. 2023)) and a characterization of interneuron synaptic connectivity, here we provide a map of the synaptic connections between the 4 best described aPerents to HVC and its 3 populations of projection neurons” in the last paragraph of the Discussion. We have now edited this sentence to include the projection from NCM to HVC and cited Louder et al., 2024.

      We have extensively mapped input pathways to HVC, and consistent with Vates (Vates, Broome et al. 1996) we have not found evidence that Field L projects to HVC. Rather that it projects to the shelf region outside of HVC. Consistent with this, we do not see retrogradely labeled neurons in Field L following tracer injections confined to HVC (see Fig. 3G). Additionally, we find that CM projections to HVC arise from the nucleus Avalanche (Roberts, Hisey et al. 2017) which we specifically examine in this study. We do not dispute that there may be other pathways projecting to HVC that will need to be examined in the future, including known projections from neuromodulatory regions and RA, from developmentally restricted pathway(s) like NCM (Louder, Kuroda et al. 2024), and from yet unidentified pathways.

      b) Also note that birds in this study were adults and that some inputs to HVC likely to be important for learning may recede during development (e.g. Louder et al, 2024).

      In the second to last paragraph of the Discussion we now state: While our opsin-assisted circuit mapping provides us with a new level of insight into HVC synaptic circuitry, there are limitations to this research that should be considered. All circuit mapping in this study was carried out in brain slices from adult male zebra finches. Future studies will be needed to examine how this adult connectivity pattern relates to patterns of connectivity in juveniles during sensory or sensorimotor phases of vocal learning and connectivity patterns in female birds.   

      (4) Consider cosmetic changes to figures as suggested by Reviewers 2-3 below.

      We thank the reviewers for their suggestions and have implemented the changes as best we can.

      (5) Address all minor issues raised below.

      Reviewer #1 (Recommendations for the authors):

      I see this study is well designed to answer the author's specific question, mapping synaptic auditorymotor connections within HVC. Their experiments with advanced techniques of projection-specific optogenetic manipulation of synaptic inputs and retrograde identification of projection areas revealed input-output combination selective synaptic mapping.

      As I found this study advanced our knowledge with the compelling dataset, I have only some minor comments here.

      (1) One technical concern is we don't see how much the virus infection was focused on the target area and if we can ignore the eXect of synaptic connectivity from surrounding areas. As the amount of virus they injected is large (1.5ul) and target areas are small, we assume the virus might spread to the surrounding area, such as field L which also projects to HVC when targeting Nif. While I think the majority of the projections were from their target areas, it would be better to mention (also the images with larger view areas) the possibilities of projections of surrounding areas.

      We agree with the reviewer about the concern about specificity of viral expression. For this reason, we included sample images of the viral expression in each target area (panel A in Fig. 2,3,5,6). We have now also included a sentence at the beginning of each subsection of our Result to describe how we have ensured interpretability of the results. Uva and mMAN’s surrounding areas are not known to project to HVC. Possible cross-infection is an issue for Av and NIf, and we checked each bird’s injection site to ensure that eGtACR1+ cells were not visible in the unintended HVC-projecting areas.

      As mentioned in our response the public comment, consistent with Vates (Vates, Broome et al. 1996) we do not see evidence that Field L projects directly to HVC (see Fig. 3G).

      (2) Another concern about the technical issue is the damage to axonal projections. While I understand the authors stimulated axonal terminals axonal projections were assumed to be cut and their ability to release neurotransmitters would be reduced especially after long-term survival or repeated stimulation. Mentioning whether projection pathways were within their 230um-thick slice (probably depends on input sites) or not and the eXect of axonal cut would be helpful.

      We agree that slice electrophysiology has limitations. However, we disagree with the claim of reduced reliability or stability of the evoked response. We and others find that electrical and optogenetic repeated terminal stimulation in slices can yield stable post-synaptic responses for tens of minutes and even hours (Bliss and Gardner-Medwin 1973, Bliss and Lomo 1973, Liu, Kurotani et al. 2004, Pastalkova, Serrano et al. 2006, Xu, Yu et al. 2009, Trusel, Cavaccini et al. 2015, Trusel, Nuno-Perez et al. 2019). Indeed, long-term synaptic plasticity experiments in most preparations and across brain areas rely on such stability of the presynaptic machinery for synaptic release, despite axons being severed from their parent soma. Our assumption is the vast majority, if not all, connections between axon terminals and their cell body in the aXerent regions have been cut in our preparations. Nonetheless, the diversity of outcomes we report (currents returning after TTX+4AP or not, depending on the specific combination of input and HVCPN class) is consistent with the robustness of the synaptic interrogation method. 

      (3) While I understand this study focused on 4 major input areas and the authors provide good pictures of synaptic HVC connections from those areas, HVC has been reported to receive auditory inputs from other areas as well (CMM, FieldL, etc.). It is worth mentioning that there are other auditory inputs and would be interesting to discuss coordination with the inputs from other areas.

      We have extensively mapped input pathways to HVC, and consistent with Vates (Vates, Broome et al. 1996) we have not found evidence that Field L projects to HVC. Rather that it projects to the shelf region outside of HVC. Consistent with this, we do not see retrogradely labeled neurons in Field L following tracer injections confined to HVC (see Fig. 3G). Additionally, we find that CM projections to HVC arise from the nucleus Avalanche (Roberts, Hisey et al. 2017) which we specifically examine in this study. We do not dispute that there may be other pathways projecting to HVC that will need to be examined in the future, including known projections from neuromodulatory regions and RA, from developmentally restricted pathway(s) like NCM (Louder, Kuroda et al. 2024), and from yet unidentified pathways.

      (4) The HVC local neuronal connections have been reported to be modified and a recent study revealed the transient auditory inputs into HVC during song learning period. The author discusses the functions of HVC synaptic connections on song learning (also title says synaptic connection for song learning), however, the experiments were done in adults and dp not discuss the possibility of diXerent synaptic connection mapping in juveniles in the song learning period. Mentioning the neuronal activities and connectivity changes during song learning is important. Also, it would be helpful for the readers to discuss the potential diXerences between juveniles/adults if they want to discuss the functions of song learning.

      We now mention in the Discussion that this is an important caveat of our research and that future studies will be needed to examine how these adult connectivity patterns relate to connectivity patterns in juveniles during sensory or sensorimotor phases of vocal learning and connectivity patterns in female birds. Nonetheless, the title and abstract cite song learning because it is important for the broader public to understand that at least some of these aXerent brain regions carry an essential role in song learning (Foster and Bottjer 2001, Roberts, Gobes et al. 2012, Roberts, Hisey et al. 2017, Zhao, Garcia-Oscos et al. 2019, Koparkar, Warren et al. 2024).

      Reviewer #2 (Recommendations for the authors):

      The work is very detailed and will be an important resource to those working in the field. The recordings are of a high quality and lots of information is included such as measures of response kinetics amplitude and pharmacological confirmation of excitatory and inhibitory synaptic responses. In general, I feel the quality is extremely high and the quantity of data is on a very significant exhaustive scale that will certainly aid the field. I have come at this conclusion as a non zebra finch person but I feel the connection information shown will be of benefit given its high quality.

      Figure 7 is a nice way of showing the overall organization. Optional suggestion, consider highlighting anything in Figure 7 that results in a new understanding of the song system as compared to previous work on anatomy and function.

      We thank the reviewer for the kind comments about our research. We have highlighted our newly found connection between mMAN and Av and all the connections onto the HVC PNs in Panel B are newly identified in this study.

      Reviewer #3 (Recommendations for the authors):

      Major points

      (1) Clarification regarding methods for determining monosynaptic events:

      One of the manipulations that I struggled the most with was those describing the use of TTX + 4AP to isolate monosynaptic events. Initially, not being as familiar with the use of optically based photostimulation of axons to release transmitter locally, I was initially confused by statements such as "we found that oEPSC returned after application of TTX+4AP". This might be clear to someone performing these manipulations, but a bit more clarification would be helpful. Should I assume that an existing monosynaptic EPSC would be masked by co-occurring polysynaptic IPSCs which disappear following application of TTX + 4AP, thereby unmasking the monosynaptic EPSC, thereby causing the EPSC to "return"? A word that I am not sure works. Continuing my confusion with these experiments, I am unsure how this cocktail of drugs is added, if it is even added as a cocktail, which is what I initially assumed. The methods and the results are not so clear if they are added in sequence and why and if traces are recorded after the addition of both drugs or if they are recorded for TTX and then again for TTX + 4AP. Finally, looking at the traces in the experimental figures (e.g. Figures 2F, 3F, 5F, and 6F), it is diXicult to see what is being shown, at least for me. First, the authors need to describe better in the results why they stimulate twice in short succession and why they seem to use the response to the second pulse (unless I am mistaken) to measure the monosynaptic event. Second, I was confused by the traces (which are very small) in the presence of TTX. I would have expected to see a response if there was a monosynaptic EPSC but I only seem to see a flat line.  

      The confusion that I list above might be due in part to my ignorance, but it is important in these types of papers not to assume too much expertise if you want readers with a less sophisticated understanding of synaptic physiology to understand the data. In other words, a little bit more clarity and hand-holding would be welcome.

      We understand the reviewer’s confusion about the methodology.  In Voltage clamp, the amplifier injects current through the electrode maintaining the membrane voltage to -70mV, where the equilibrium potential for Cl- is near equilibrium, and therefore the only synaptic current evoked by light stimulation is due to cation influx, mainly through AMPA receptors (see Fig. 1).  Therefore, cooccurring polysynaptic IPSCs wouldn’t be visible. We examine those holding the membrane voltage at +10mV, see Fig. 1. TTX application suppresses V-dependent Na+ channels and therefore stops all neurotransmission. We show the traces upon TTX to show that currents we were recording prior to TTX application were of synaptic origin, and not due to accidental expression of opsin in the patched cell. Also, this ensures that any current visible after 4AP application is due to monosynaptic transmission and not to a failure of TTX application.

      After recording and light stimulation with TTX, we then add 4AP, which is a blocker of presynaptic K+ channels. This prevents the repolarization of the terminals that would occur in response to opsinmediated local depolarization. 4AP application, therefore, allows local opsin-driven depolarizations to reach the threshold for Ca2+-dependent vesicle docking and release. This procedure selectively reveals or unmasks the monosynaptic currents because any non-monosynaptically connected neuron would still need V-dependent Na+ channels to eXectively produce indirect neurotransmission onto the patched cell. The TTX and 4AP application is the gold-standard of opsinassisted synaptic circuit interrogation, pioneered by the Svoboda lab in 2009 and widely used to assess monosynaptic connectivity in multiple brain circuits, as summarized in a recent review (Linders et al., 2022). We now include 2 more sentences near the beginning of the Results to clarify this process and directly point to the Linders review for researchers wanting a deeper explanation of this technique. 

      The double stimulation is unrelated to our testing of monosynaptic connections. We originally conducted the experiments by delivering 2 pulses of light separated by 50ms, a common way to examine the pair-pulse ratio (PPR) – a physiological measure which is used to probe synapses for short-term plasticity and release probability. However, through discussions with colleagues we realized that the slow decay time of eGtACR1 may complicate interpretation of the response to the second light pulse. Thus, we elected to not report these results and indicated this in the Methods section:  “We calculated the paired-pulse ratio (PPR) as the amplitude of the second peak divided by the amplitude of the first peak elicited by the twin stimuli, however due to slow kinetics of eGtACR1 the results would be diPicult to interpret, and therefore we are not currently reporting them.” 

      (2) Suggestions for improving summary figures:

      Summary Figure 1a: The circuit diagram (schematic to the right of 1a) is OK but I initially found it a bit diXicult to interpret. For example, it is not clear why pink RA projecting neurons don't reach as far to the right as X or Av projecting neurons, suggesting that they are not really projection neurons. Also, the big question marks in the intermediate zone are not entirely intuitive. It seems there might be a better way of representing this. It might also be worth stating in the figure legend that the interconnectivity patterns shown in the figure between PNs in HVC are based on specific prior studies.

      We thank the reviewer for the constructive criticism. We have modified the figure to extend the RA projection line and mentioned in the figure legend that connectivity between PNs is based on prior studies.

      Summary Figure 1a: I am not sure I love this figure. There are a few minor issues. First, there are too many browns [Nif/AV and mMAN] which makes it more challenging to clearly disambiguate the diXerent projections. Second, it is unclear why this figure does not represent projections from RA to HVC. My biggest concern with this figure is that it oversimplifies some of the findings. From the figure, one gets the impression that Uva only projects to RA-PNs and that Av only projects to X-PNs even though the authors show connections to other PNs. With the small sample size in this current study for each projection and each PN type, one really cannot rule out that these "minority" projections are not important. I, therefore, suggest that the authors qualitatively represent the strength/probability of connections by weighting with thickness of aXerent connections.

      We assume the reviewer is commenting on our summary figure panel 7B. We agree with the referee that this is a simplified representation of our findings. We had indeed indicated in the legend that this was just a “Schematic of the HVC aXerent connectivity map resulting from the present work” and that “For conceptualization purposes, aXerent connectivity to HVC-PNs is shown only when the rate of monosynaptic connectivity reaches 50% of neurons examined”. We have added a title to highlight that this is but a simplification. We have now adjusted the colors to make the figure easier to follow. Based on the reviewers critique we searched for a better method for summarizing the complex connectivity patterns described in this research. We settled on a Sankey diagram of connectivity. This is now Figure 7C. In this diagram, we are able to show the proportion of connections from each input pathway onto each class of neuron and if these connections are poly or monosynaptic. We find this to a straightforward way of displaying all of the connectivity patterns identified in our figure 2-3 and 4-5 look forward to understanding if the reviewers find this a useful way of illustrating our findings.

      Minor points:

      (1) Line 50 - typo - song circuits.

      Thank you for catching this.

      (2) Line 106 - 111 - The findings suggest that 100% of Uva projections onto HVCRA neurons are monosynaptic. However, because the authors only tested 6 neurons their statements that their findings are so diXerent from other studies, should be somewhat tempered since these other studies (e.g. Moll et al.) looked at 251 neurons in HVC and sampling bias could still somewhat explain the diXerence.

      We observed oEPSCs in 43 of 51 (84.3%) HVC-RA neurons recorded (mean rise time = 2.4 ms) and monosynaptic connections onto 100% of the HVC-RA neurons tested (n = 6). Moll et al. combined electrical stimulation of Uva with two-photon calcium imaging (GCaMP6s) of putative HVC-RA neurons (n = 251 neurons). We should note that these are putative HVC-RA neurons because they were not visually identified using retrograde tracing or using some other molecular handle. They report that only ~16% of HVC-RA neurons showed reliable calcium responses following Uva stimulation. Although the experiments by Moll et al are technically impressive, calcium imaging is an insensitive technique for measuring post-synaptic responses, particularly subthreshold responses, when compared to whole-cell patch-clamp recordings. This approach cannot identify monosynaptic connections and is likely limited to only be sensitive suprathreshold activity that likely relies on recruitment of other polysynaptic inputs onto the neurons in HVC. Furthermore, as indicated in the Discussion, our opsin-mediated synaptic interrogation recruits any eGtACR1+ Uva terminal in the slice and therefore will have great likelihood of revealing any existing connections. 

      A limitation of whole-cell patch-clamp recordings is that it is a laborious low throughput technique. Future experiments using better imaging approaches, like voltage imaging, may be able to weigh in on diXerences between what we report here using whole-cell patch-clamp recordings from visually identified HVC-RA neurons combined with optogenetic manipulations of Uva terminals and the calcium imaging results reported by Moll. Nonetheless, whole-cell patch-clamp recordings combined with optogenetic manipulations is likely to remain the most sensitive method for identifying synaptic connectivity.

      (3) Figure 2G - the significance of white circles is not clear.

      The figure legend indicates that those highlight and mark the position of “retrogradely labeled HVCprojecting neurons in Uva (cyan, white circles)” to facilitate identification of colocalization with the in-situ markers.

      (4) Line 135 - Cardin et al. (J. Neurophys. 2004) is the first to show that song production does not require Nif.

      We thank the reviewer pointing this out and we have cited this important study. 

      (5) Line 183 - This is a confusing sentence because I initially thought that mMAN-mMANHVC PNs was a category!

      We switched the dash with a colon.

      (6) Figure 4d could use some arrows to identify what is shown. It is assumed that the box represents mMAN. Should it be assumed that Av is not in the plane of this section? If not, this should be stated in the legend. It is also unclear where the anterograde projections are. Is this the dork highway that goes from the box to the dorsal surface? If yes this should be indicated but it should also be made clear why the projections go both in the dorsal as well as the ventral directions.

      The inset, as indicated by the lines around it, is a magnification of the terminal fields in Av. We added an explanation of the inset.

      (7) Discussion. In the introduction, the authors mention projections from RA to HVC but never end up studying them in the current manuscript which seems like a missed opportunity and perhaps even a weakness of the study. In the discussion, it would certainly be good for the authors to at least discuss the possible significance of these projections and perhaps why they decided not to study them.

      We thank the reviewer for the comment. Unfortunately, we couldn’t reliably evoke interpretable currents from RA, and we elected to publish the current version of the paper with these 4 major inputs. Nonetheless, we have indicated in the Introduction and in the Discussion that more inputs (e.g. RA, A11, NCM) remain to be evaluated. 

      (8) Line 622 - Is this reference incomplete?

      We thank the reviewer. We have corrected the reference.

      • Ben-Tov, M., F. Duarte and R. Mooney (2023). "A neural hub for holistic courtship displays." Curr Biol 33(9): 1640-1653 e1645.

      • Bliss, T. V. and A. R. Gardner-Medwin (1973). "Long-lasting potentiation of synaptic transmission in the dentate area of the unanaestetized rabbit following stimulation of the perforant path." J Physiol 232(2): 357-374.

      • Bliss, T. V. and T. Lomo (1973). "Long-lasting potentiation of synaptic transmission in the dentate area of the anaesthetized rabbit following stimulation of the perforant path." J Physiol 232(2): 331-356.

      • Foster, E. F. and S. W. Bottjer (2001). "Lesions of a telencephalic nucleus in male zebra finches: Influences on vocal behavior in juveniles and adults." J Neurobiol 46(2): 142-165.

      • Koparkar, A., T. L. Warren, J. D. Charlesworth, S. Shin, M. S. Brainard and L. Veit (2024). "Lesions in a songbird vocal circuit increase variability in song syntax." Elife 13.

      • Linders, L. E., L. F. Supiot, W. Du, R. D'Angelo, R. A. H. Adan, D. Riga and F. J. Meye (2022). "Studying Synaptic Connectivity and Strength with Optogenetics and Patch-Clamp Electrophysiology." Int J Mol Sci 23(19).

      • Liu, H. N., T. Kurotani, M. Ren, K. Yamada, Y. Yoshimura and Y. Komatsu (2004). "Presynaptic activity and Ca2+ entry are required for the maintenance of NMDA receptor-independent LTP at visual cortical excitatory synapses." J Neurophysiol 92(2): 1077-1087.

      • Louder, M. I. M., M. Kuroda, D. Taniguchi, J. A. Komorowska-Muller, Y. Morohashi, M. Takahashi, M. Sanchez-Valpuesta, K. Wada, Y. Okada, H. Hioki and Y. Yazaki-Sugiyama (2024). "Transient sensorimotor projections in the developmental song learning period." Cell Rep 43(5): 114196.

      • Pastalkova, E., P. Serrano, D. Pinkhasova, E. Wallace, A. A. Fenton and T. C. Sacktor (2006). "Storage of spatial information by the maintenance mechanism of LTP." Science 313(5790): 1141-1144.

      • Petreanu, L., T. Mao, S. M. Sternson and K. Svoboda (2009). "The subcellular organization of neocortical excitatory connections." Nature 457(7233): 1142-1145.

      • Roberts, T. F., S. M. Gobes, M. Murugan, B. P. Olveczky and R. Mooney (2012). "Motor circuits are required to encode a sensory model for imitative learning." Nat Neurosci 15(10): 1454-1459.

      • Roberts, T. F., E. Hisey, M. Tanaka, M. G. Kearney, G. Chattree, C. F. Yang, N. M. Shah and R. Mooney (2017). "Identification of a motor-to-auditory pathway important for vocal learning." Nat Neurosci 20(7): 978-986.

      • Roberts, T. F., M. E. Klein, M. F. Kubke, J. M. Wild and R. Mooney (2008). "Telencephalic neurons monosynaptically link brainstem and forebrain premotor networks necessary for song." J Neurosci 28(13): 3479-3489.

      • Trusel, M., A. Cavaccini, M. Gritti, B. Greco, P. P. Saintot, C. Nazzaro, M. Cerovic, I. Morella, R. Brambilla and R. Tonini (2015). "Coordinated Regulation of Synaptic Plasticity at Striatopallidal and Striatonigral Neurons Orchestrates Motor Control." Cell Rep 13(7): 1353-1365.

      • Trusel, M., A. Nuno-Perez, S. Lecca, H. Harada, A. L. Lalive, M. Congiu, K. Takemoto, T. Takahashi, F. Ferraguti and M. Mameli (2019). "Punishment-Predictive Cues Guide Avoidance through Potentiation of Hypothalamus-to-Habenula Synapses." Neuron 102(1): 120-127.e124.

      • Vates, G. E., B. M. Broome, C. V. Mello and F. Nottebohm (1996). "Auditory pathways of caudal telencephalon and their relation to the song system of adult male zebra finches." Journal of Comparative Neurology 366(4): 613-642.

      • Xu, T., X. Yu, A. J. Perlik, W. F. Tobin, J. A. Zweig, K. Tennant, T. Jones and Y. Zuo (2009). "Rapid formation and selective stabilization of synapses for enduring motor memories." Nature 462(7275): 915-919.

      • Zhao, W., F. Garcia-Oscos, D. Dinh and T. F. Roberts (2019). "Inception of memories that guide vocal learning in the songbird." Science 366: 83 - 89.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Wang et al., recorded concurrent EEG-fMRI in 107 participants during nocturnal NREM sleep to investigate brain activity and connectivity related to slow oscillations (SO), sleep spindles, and in particular their co-occurrence. The authors found SO-spindle coupling to be correlated with increased thalamic and hippocampal activity, and with increased functional connectivity from the hippocampus to the thalamus and from the thalamus to the neocortex, especially the medial prefrontal cortex (mPFC). They concluded the brain-wide activation pattern to resemble episodic memory processing, but to be dissociated from task-related processing and suggest that the thalamus plays a crucial role in coordinating the hippocampal-cortical dialogue during sleep.

      The paper offers an impressively large and highly valuable dataset that provides the opportunity for gaining important new insights into the network substrate involved in SOs, spindles, and their coupling. However, the paper does unfortunately not exploit the full potential of this dataset with the analyses currently provided, and the interpretation of the results is often not backed up by the results presented. I have the following specific comments.

      Thank you for your thoughtful and constructive feedback. We greatly appreciate your recognition of the strengths of our dataset and findings Below, we address your specific comments and provide responses to each point you raised to ensure our methods and results are as transparent and comprehensible as possible. We hope these revisions address your comments and further strengthen our manuscript. Thank you again for the constructive feedback.

      (1) The introduction is lacking sufficient review of the already existing literature on EEG-fMRI during sleep and the BOLD-correlates of slow oscillations and spindles in particular (Laufs et al., 2007; Schabus et al., 2007; Horovitz et al., 2008; Laufs, 2008; Czisch et al., 2009; Picchioni et al., 2010; Spoormaker et al., 2010; Caporro et al., 2011; Bergmann et al., 2012; Hale et al., 2016; Fogel et al., 2017; Moehlman et al., 2018; Ilhan-Bayrakci et al., 2022). The few studies mentioned are not discussed in terms of the methods used or insights gained.

      We acknowledge the need for a more comprehensive review of prior EEG-fMRI studies investigating BOLD correlates of slow oscillations and spindles. However, these articles are not all related to sleep SO or spindle. Articles (Hale et al., 2016; Horovitz et al., 2008; Laufs, 2008; Laufs, Walker, & Lund, 2007; Spoormaker et al., 2010) mainly focus on methodology for EEG-fMRI, sleep stages, or brain networks, which are not the focus of our study. Thank you again for your attention to the comprehensiveness of our literature review, and we will expand the introduction to include a more detailed discussion of the existing literature, ensuring that the contributions of previous EEG-fMRI sleep studies are adequately acknowledged.  

      Introduction, Page 4 Lines 62-76

      “Investigating these sleep-related neural processes in humans is challenging because it requires tracking transient sleep rhythms while simultaneously assessing their widespread brain activation. Recent advances in simultaneous EEG-fMRI techniques provide a unique opportunity to explore these processes. EEG allows for precise event-based detection of neural signal, while fMRI provides insight into the broader spatial patterns of brain activation and functional connectivity (Horovitz et al., 2008; Huang et al., 2024; Laufs, 2008; Laufs, Walker, & Lund, 2007; Schabus et al., 2007; Spoormaker et al., 2010). Previous EEG-fMRI studies on sleep have focused on classifying sleep stages or examining the neural correlates of specific waves (Bergmann et al., 2012; Caporro et al., 2012; Czisch et al., 2009; Fogel et al., 2017; Hale et al., 2016; Ilhan-Bayrakcı et al., 2022; Moehlman et al., 2019; Picchioni et al., 2011). These studies have generally reported that slow oscillations are associated with widespread cortical and subcortical BOLD changes, whereas spindles elicit activation in the thalamus, as well as in several cortical and paralimbic regions. Although these findings provide valuable insights into the BOLD correlates of sleep rhythms, they often do not employ sophisticated temporal modeling (Huang et al., 2024), to capture the dynamic interactions between different oscillatory events, e.g., the coupling between SOs and spindles.”

      (2) The paper falls short in discussing the specific insights gained into the neurobiological substrate of the investigated slow oscillations, spindles, and their interactions. The validity of the inverse inference approach ("Open ended cognitive state decoding"), assuming certain cognitive functions to be related to these oscillations because of the brain regions/networks activated in temporal association with these events, is debatable at best. It is also unclear why eventually only episodic memory processing-like brain-wide activation is discussed further, despite the activity of 16 of 50 feature terms from the NeuroSynth v3 dataset were significant (episodic memory, declarative memory, working memory, task representation, language, learning, faces, visuospatial processing, category recognition, cognitive control, reading, cued attention, inhibition, and action).

      Thank you for pointing this out, particularly regarding the use of inverse inference approaches such as “open-ended cognitive state decoding.” Given the concerns about the indirectness of this approach, we decided to remove its related content and results from Figure 3 in the main text and include it in Supplementary Figure 7. We will refocus the main text on direct neurobiological insights gained from our EEG-fMRI analyses, particularly emphasizing the hippocampal-thalamocortical network dynamics underlying SO-spindle coupling, and we will acknowledge the exploratory nature of these findings and highlight their limitations.

      Discussion, Page 17-18 Lines 323-332

      “To explore functional relevance, we employed an open-ended cognitive state decoding approach using meta-analytic data (NeuroSynth: Yarkoni et al. (2011)). Although this method usefully generates hypotheses about potential cognitive processes, particularly in the absence of a pre- and post-sleep memory task, it is inherently indirect. Many cognitive terms showed significant associations (16 of 50), such as “episodic memory,” “declarative memory,” and “working memory.” We focused on episodic/declarative memory given the known link with hippocampal reactivation (Diekelmann & Born, 2010; Staresina et al., 2015; Staresina et al., 2023). Nonetheless, these inferences regarding memory reactivation should be interpreted cautiously without direct behavioral measures. Future research incorporating explicit tasks before and after sleep would more rigorously validate these potential functional claims.”

      (3) Hippocampal activation during SO-spindles is stated as a main hypothesis of the paper - for good reasons - however, other regions (e.g., several cortical as well as thalamic) would be equally expected given the known origin of both oscillations and the existing sleep-EEG-fMRI literature. However, this focus on the hippocampus contrasts with the focus on investigating the key role of the thalamus instead in the Results section.

      We appreciate your insight regarding the relative emphasis on hippocampal and thalamic activation in our study. We recognize that the manuscript may currently present an inconsistency between our initial hypothesis and the main focus of the results. To address this concern, we will ensure that our Introduction and Discussion section explicitly discusses both regions, highlighting the complementary roles of the hippocampus (memory processing and reactivation) and the thalamus (spindle generation and cortico-hippocampal coordination) in SO-spindle dynamics.

      Introduction, Page 5 Lines 87-103

      “To address this gap, our study investigates brain-wide activation and functional connectivity patterns associated with SO-spindle coupling, and employs a cognitive state decoding approach (Margulies et al., 2016; Yarkoni et al., 2011)—albeit indirectly—to infer potential cognitive functions. In the current study, we used simultaneous EEG-fMRI recordings during nocturnal naps (detailed sleep staging results are provided in the Methods and Table S1) in 107 participants. Although directly detecting hippocampal ripples using scalp EEG or fMRI is challenging, we expected that hippocampal activation in fMRI would coincide with SO-spindle coupling detected by EEG, given that SOs, spindles, and ripples frequently co-occur during NREM sleep. We also anticipated a critical role of the thalamus, particularly thalamic spindles, in coordinating hippocampal-cortical communication.

      We found significant coupling between SOs and spindles during NREM sleep (N2/3), with spindle peaks occurring slightly before the SO peak. This coupling was associated with increased activation in both the thalamus and hippocampus, with functional connectivity patterns suggesting thalamic coordination of hippocampal-cortical communication. These findings highlight the key role of the thalamus in coordinating hippocampal-cortical interactions during human sleep and provide new insights into the neural mechanisms underlying sleep-dependent brain communication. A deeper understanding of these mechanisms may contribute to future neuromodulation approaches aimed at enhancing sleep-dependent cognitive function and treating sleep-related disorders.”

      Discussion, Page 16-17 Lines 292-307

      “When modeling the timing of these sleep rhythms in the fMRI, we observed hippocampal activation selectively during SO-spindle events. This suggests the possibility of triple coupling (SOs–spindles–ripples), even though our scalp EEG was not sufficiently sensitive to detect hippocampal ripples—key markers of memory replay (Buzsáki, 2015). Recent iEEG evidence indicates that ripples often co-occur with both spindles (Ngo, Fell, & Staresina, 2020) and SOs (Staresina et al., 2015; Staresina et al., 2023). Therefore, the hippocampal involvement during SO-spindle events in our study may reflect memory replay from the hippocampus, propagated via thalamic spindles to distributed cortical regions.

      The thalamus, known to generate spindles (Halassa et al., 2011), plays a key role in producing and coordinating sleep rhythms (Coulon, Budde, & Pape, 2012; Crunelli et al., 2018), while the hippocampus is found essential for memory consolidation (Buzsáki, 2015; Diba & Buzsá ki, 2007; Singh, Norman, & Schapiro, 2022). The increased hippocampal and thalamic activity, along with strengthened connectivity between these regions and the mPFC during SO-spindle events, underscores a hippocampal-thalamic-neocortical information flow. This aligns with recent findings suggesting the thalamus orchestrates neocortical oscillations during sleep (Schreiner et al., 2022). The thalamus and hippocampus thus appear central to memory consolidation during sleep, guiding information transfer to the neocortex, e.g., mPFC.”

      (4) The study included an impressive number of 107 subjects. It is surprising though that only 31 subjects had to be excluded under these difficult recording conditions, especially since no adaptation night was performed. Since only subjects were excluded who slept less than 10 min (or had excessive head movements) there are likely several datasets included with comparably short durations and only a small number of SOs and spindles and even less combined SO-spindle events. A comprehensive table should be provided (supplement) including for each subject (included and excluded) the duration of included NREM sleep, number of SOs, spindles, and SO+spindle events. Also, some descriptive statistics (mean/SD/range) would be helpful.

      We appreciate your recognition of our sample size and the challenges associated with simultaneous EEG-fMRI sleep recordings. We acknowledge the importance of transparently reporting individual subject data, particularly regarding sleep duration and the number of detected SOs, spindles, and SO-spindle events. To address this, we will provide comprehensive tables in the supplementary materials, contains descriptive information about sleep-related characteristics (Table S1), as well as detailed information about sleep waves at each sleep stage for all 107 subjects(Table S2-S4), listing for each subject:(1)Different sleep stage duration; (2)Number of detected SOs; (3)Number of detected spindles; (4)Number of detected SO-spindle coupling events; (5)Density of detected SOs; (6)Density of detected spindles; (7)Density of detected SO-spindle coupling events.

      However, most of the excluded participants were unable to fall asleep or had too short a sleep duration, so they basically had no NREM sleep period, so it was impossible to count the NREM sleep duration, SO, spindle, and coupling numbers.

      Supplementary Materials, Page 42-54, Table S1-S4

      (5) Was the 20-channel head coil dedicated for EEG-fMRI measurements? How were the electrode cables guided through/out of the head coil? Usually, the 64-channel head coil is used for EEG-fMRI measurements in a Siemens PRISMA 3T scanner, which has a cable duct at the back that allows to guide the cables straight out of the head coil (to minimize MR-related artifacts). The choice for the 20-channel head coil should be motivated. Photos of the recording setup would also be helpful.

      Thank you for your comment regarding our choice of the 20-channel head coil for EEG-fMRI measurements. We acknowledge that the 64-channel head coil is commonly used in Siemens PRISMA 3T scanners; however, the 20-channel coil was selected due to specific practical and technical considerations in our study. In particular, the 20-channel head coil was compatible with our EEG system and ensured sufficient signal-to-noise ratio (SNR) for both EEG and fMRI acquisition. The EEG electrode cables were guided through the lateral and posterior openings of the head coil, secured with foam padding to reduce motion and minimize MR-related artifacts. Moreover, given the extended nature of nocturnal sleep recordings, the 20-channel coil allowed us to maintain participant comfort while still achieving high-quality simultaneous EEG-fMRI data.

      We have made this clearer in the revised manuscript. 

      Methods, Page 20 Lines 385-392

      “All MRI data were acquired using a 20-channel head coil on a research-dedicated 3-Tesla Siemens Magnetom Prisma MRI scanner. Earplugs and cushions were provided for noise protection and head motion restriction. We chose the 20-channel head coil because it was compatible with our EEG system and ensured sufficient signal-to-noise ratio (SNR) for both EEG and fMRI acquisition. The EEG electrode cables were guided through the lateral and posterior openings of the head coil, secured with foam padding to reduce motion and minimize MR-related artifacts. Moreover, given the extended nature of nocturnal sleep recordings, the 20-channel coil helped maintain participant comfort while still achieving high-quality simultaneous EEG-fMRI data.”

      (6) Was the EEG sampling synchronized to the MR scanner (gradient system) clock (the 10 MHz signal; not referring to the volume TTL triggers here)? This is a requirement for stable gradient artifact shape over time and thus accurate gradient noise removal.

      Thank you for raising this important point. We confirm that the EEG sampling was synchronized to the MR scanner’s 10 MHz gradient system clock, ensuring a stable gradient artifact shape over time and enabling accurate artifact removal. This synchronization was achieved using the standard clock synchronization interface of the EEG amplifier, minimizing timing jitter and drift. As a result, the gradient artifact waveform remained stable across volumes, allowing for more effective artifact correction during preprocessing. We appreciate your attention to this critical aspect of EEG-fMRI data acquisition.

      We have made this clearer in the revised manuscript. 

      Methods, Page 19-20 Lines 371-383

      “EEG was recorded simultaneously with fMRI data using an MR-compatible EEG amplifier system (BrainAmps MR-Plus, Brain Products, Germany), along with a specialized electrode cap. The recording was done using 64 channels in the international 10/20 system, with the reference channel positioned at FCz. In order to adhere to polysomnography (PSG) recording standards, six electrodes were removed from the EEG cap: one for electrocardiogram (ECG) recording, two for electrooculogram (EOG) recording, and three for electromyogram (EMG) recording. EEG data was recorded at a sample rate of 5000 Hz, the resistance of the reference and ground channels was kept below 10 kΩ, and the resistance of the other channels was kept below 20 kΩ. To synchronize the EEG and fMRI recordings, the BrainVision recording software (BrainProducts, Germany) was utilized to capture triggers from the MRI scanner. The EEG sampling was synchronized to the MR scanner’s 10 MHz gradient system clock, ensuring a stable gradient artifact shape over time and enabling accurate artifact removal. This was achieved via the standard clock synchronization interface of the EEG amplifier, minimizing timing jitter and drift.”

      (7) The TR is quite long and the voxel size is quite large in comparison to state-of-the-art EPI sequences. What was the rationale behind choosing a sequence with relatively low temporal and spatial resolution?

      We acknowledge that our chosen TR and voxel size are relatively long and large compared to state-of-the-art EPI sequences. This decision was made to optimize the signal-to-noise ratio (SNR) and reduce susceptibility-related distortions, which are particularly critical in EEG-fMRI sleep studies where head motion and physiological noise can be substantial. A longer TR allowed us to sample whole-brain activity with sufficient coverage, while a larger voxel size helped enhance BOLD sensitivity and minimize partial volume effects in deep brain structures such as the thalamus and hippocampus, which are key regions of interest in our study. We appreciate your concern and hope this clarification provides sufficient rationale for our sequence parameters.

      We have made this clearer in the revised manuscript. 

      Methods, Page 20-21 Lines 398-408

      “Then, the “sleep” session began after the participants were instructed to try and fall asleep. For the functional scans, whole-brain images were acquired using k-space and steady-state T2*-weighted gradient echo-planar imaging (EPI) sequence that is sensitive to the BOLD contrast. This measures local magnetic changes caused by changes in blood oxygenation that accompany neural activity (sequence specification: 33 slices in interleaved ascending order, TR = 2000 ms, TE = 30 ms, voxel size = 3.5 × 3.5 × 4.2 mm3, FA = 90°, matrix = 64 × 64, gap = 0.7 mm). A relatively long TR and larger voxel size were chosen to optimize SNR and reduce susceptibility-related distortions, which are critical in EEG-fMRI sleep studies where head motion and physiological noise can be substantial. The longer TR allowed whole-brain coverage with sufficient temporal resolution, while the larger voxel size helped enhance BOLD sensitivity and minimize partial volume effects in deep brain structures (e.g., the thalamus and hippocampus), which are key regions of interest in this study.”

      (8) The anatomically defined ROIs are quite large. It should be elaborated on how this might reduce sensitivity to sleep rhythm-specific activity within sub-regions, especially for the thalamus, which has distinct nuclei involved in sleep functions.

      We appreciate your insight regarding the use of anatomically defined ROIs and their potential limitations in detecting sleep rhythm-specific activity within sub-regions, particularly in the thalamus. Given the distinct functional roles of thalamic nuclei in sleep processes, we acknowledge that using a single, large thalamic ROI may reduce sensitivity to localized activity patterns. To address this, we will discuss this limitation in the revised manuscript, acknowledging that our approach prioritizes whole-structure effects but may not fully capture nucleus-specific contributions.

      Discussion, Page 18 Lines 333-341

      “Despite providing new insights, our study has several limitations. First, our scalp EEG did not directly capture hippocampal ripples, preventing us from conclusively demonstrating triple coupling. Second, the combination of EEG-fMRI and the lack of a memory task limit our ability to parse fine-grained BOLD responses at the DOWN- vs. UP-states of SOs and link observed activations to behavioral outcomes. Third, the use of large anatomical ROIs may mask subregional contributions of specific thalamic nuclei or hippocampal subfields. Finally, without a memory task, we cannot establish a direct behavioral link between sleep-rhythm-locked activation and memory consolidation. Future studies combining techniques such as ultra-high-field fMRI or iEEG with cognitive tasks may refine our understanding of subregional network dynamics and functional significance during sleep.”

      (9) The study reports SO & spindle amplitudes & densities, as well as SO+spindle coupling, to be larger during N2/3 sleep compared to N1 and REM sleep, which is trivial but can be seen as a sanity check of the data. However, the amount of SOs and spindles reported for N1 and REM sleep is concerning, as per definition there should be hardly any (if SOs or spindles occur in N1 it becomes by definition N2, and the interval between spindles has to be considerably large in REM to still be scored as such). Thus, on the one hand, the report of these comparisons takes too much space in the main manuscript as it is trivial, but on the other hand, it raises concerns about the validity of the scoring.

      We appreciate your concern regarding the reported presence of SOs and spindles in N1 and REM sleep and the potential implications. Our detection method for detecting SO, spindle, and coupling were originally designed only for N2&N3 sleep data based on the characteristics of the data itself, and this method is widely recognized and used in the sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). While, because the detection methods for SO and spindle are based on percentiles, this method will always detect a certain number of events when used for other stages (N1 and REM) sleep data, but the differences between these events and those detected in stage N23 remain unclear. We will acknowledge the reasons for these results in the Methods section and emphasize that they are used only for sanity checks.

      Methods, Page 25 Lines 515-524

      “We note that the above methods for detecting SOs, spindles, and their couplings were originally developed for N2 and N3 sleep data, based on the specific characteristics of these stages. These methods are widely recognized in sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). However, because this percentile-based detection approach will inherently identify a certain number of events if applied to other stages (e.g., N1 and REM), the nature of these events in those stages remains unclear compared to N2/N3. We nevertheless identified and reported the detailed descriptive statistics of these sleep rhythms in all sleep stages, under the same operational definitions, both for completeness and as a sanity check. Within the same subject, there should be more SOs, spindles, and their couplings in N2/N3 than in N1 or REM (see also Figure S2-S4, Table S1-S4).”

      (10) Why was electrode F3 used to quantify the occurrence of SOs and spindles? Why not a midline frontal electrode like Fz (or a number of frontal electrodes for SOs) and Cz (or a number of centroparietal electrodes) for spindles to be closer to their maximum topography?

      We appreciate your suggestion regarding electrode selection for SO and spindle quantification. Our choice of F3 was primarily based on previous studies (Massimini et al., 2004; Molle et al., 2011), where bilateral frontal electrodes are commonly used for detecting SOs and spindles. Additionally, we considered the impact of MRI-related noise and, after a comprehensive evaluation, determined that F3 provided an optimal balance between signal quality and artifact minimization. We also acknowledge that alternative electrode choices, such as Fz for SOs and Cz for spindles, could provide additional insights into their topographical distributions.

      (11) Functional connectivity (hippocampus -> thalamus -> cortex (mPFC)) is reported to be increased during SO-spindle coupling and interpreted as evidence for coordination of hippocampo-neocortical communication likely by thalamic spindles. However, functional connectivity was only analysed during coupled SO+spindle events, not during isolated SOs or isolated spindles. Without the direct comparison of the connectivity patterns between these three events, it remains unclear whether this is specific for coupled SO+spindle events or rather associated with one or both of the other isolated events. The PPIs need to be conducted for those isolated events as well and compared statistically to the coupled events.

      We appreciate your critical perspective on our functional connectivity analysis and the interpretation of hippocampus-thalamus-cortex (mPFC) interactions during SO-spindle coupling. We acknowledge that, in the current analysis, functional connectivity was only examined during coupled SO-spindle events, without direct comparison to isolated SOs or isolated spindles. To address this concern, we have conducted PPI analyses for all three ROIs(Hippocampus, Thalamus, mPFC) and all three event types (SO-spindle couplings, isolated SOs, and isolated spindles). Our results indicate that neither isolated SOs nor isolated Spindles yielded significant connectivity changes in all three ROIs, as all failed to survive multiple comparison corrections. This suggests that the observed connectivity increase is specific to SO-spindle coupling, rather than being independently driven by either SOs or spindles alone.

      Results, Page 14 Lines 248-255

      “Crucially, the interaction between FC and SO-spindle coupling revealed that only the functional connectivity of hippocampus -> thalamus (ROI analysis, t(106) = 1.86, p = 0.0328) and thalamus -> mPFC (ROI analysis, t(106) = 1.98, p = 0.0251) significantly increased during SO-spindle coupling, with no significant changes in all other pathways (Fig. 4e). We also conducted PPI analyses for the other two events (SOs and spindles), and neither yielded significant connectivity changes in the three ROIs, as all failed to survive whole-brain FWE correction at the cluster level (p < 0.05). Together, these findings suggest that the thalamus, likely via spindles, coordinates hippocampal-cortical communication selectively during SO-spindle coupling, but not isolated SOs or spindle events alone.”

      (12) The limited temporal resolution of fMRI does indeed not allow for easily distinguishing between fMRI activation patterns related to SO-up- vs. SO-down-states. For this, one could try to extract the amplitudes of SO-up- and SO-down-states separately for each SO event and model them as two separate parametric modulators (with the risk of collinearity as they are likely correlated).

      We appreciate your insightful comment regarding the challenge of distinguishing fMRI activation patterns related to SO-up vs. SO-down states due to the limited temporal resolution of fMRI. While our current analysis does not differentiate between these two phases, we acknowledge that separately modeling SO-up and SO-down states using parametric modulators could provide a more refined understanding of their distinct neural correlates. However, as you notes, this approach carries the risk of collinearity, and there is indeed a high correlation between the two amplitudes across all subjects in our results (r=0.98). Future studies could explore more on leveraging high-temporal-resolution techniques. While implementing this in the current study is beyond our scope, we will acknowledge this limitation in the Discussion section.

      Discussion, Page 17 Lines 308-322

      “An intriguing aspect of our findings is the reduced DMN activity during SOs when modeled at the SO trough (DOWN-state). This reduced DMN activity may reflect large-scale neural inhibition characteristic of the SO trough. The DMN is typically active during internally oriented cognition (e.g., self-referential processing or mind-wandering) and is suppressed during external stimuli processing (Yeshurun, Nguyen, & Hasson, 2021). It is unlikely, however, that this suppression of DMN during SO events is related to a shift from internal cognition to external responses given it is during deep sleep time. Instead, it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead, Fig. S5). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state. Interestingly, no such DMN reduction was found during SO-spindle coupling, implying that coupling may involve distinct neural dynamics that partially re-engage DMN-related processes, possibly reflecting memory-related reactivation. Future research using high-temporal-resolution techniques like iEEG could clarify these possibilities.”

      Discussion, Page 18 Lines 333-341

      “Despite providing new insights, our study has several limitations. First, our scalp EEG did not directly capture hippocampal ripples, preventing us from conclusively demonstrating triple coupling. Second, the combination of EEG-fMRI and the lack of a memory task limit our ability to parse fine-grained BOLD responses at the DOWN- vs. UP-states of SOs and link observed activations to behavioral outcomes. Third, the use of large anatomical ROIs may mask subregional contributions of specific thalamic nuclei or hippocampal subfields. Finally, without a memory task, we cannot establish a direct behavioral link between sleep-rhythm-locked activation and memory consolidation. Future studies combining techniques such as ultra-high-field fMRI or iEEG with cognitive tasks may refine our understanding of subregional network dynamics and functional significance during sleep.”

      (13) L327: "It is likely that our findings of diminished DMN activity reflect brain activity during the SO DOWN-state, as this state consistently shows higher amplitude compared to the UP-state within subjects, which is why we modelled the SO trough as its onset in the fMRI analysis." This conclusion is not justified as the fact that SO down-states are larger in amplitude does not mean their impact on the BOLD response is larger.

      We appreciate your concern regarding our interpretation of diminished DMN activity reflecting the SO down-state. We acknowledge that the current expression is somewhat misleading, and our interpretation of it is: it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state. And we will make this clear in the Discussion section.

      Discussion, Page 17 Lines 308-322

      “An intriguing aspect of our findings is the reduced DMN activity during SOs when modeled at the SO trough (DOWN-state). This reduced DMN activity may reflect large-scale neural inhibition characteristic of the SO trough. The DMN is typically active during internally oriented cognition (e.g., self-referential processing or mind-wandering) and is suppressed during external stimuli processing (Yeshurun, Nguyen, & Hasson, 2021). It is unlikely, however, that this suppression of DMN during SO events is related to a shift from internal cognition to external responses given it is during deep sleep time. Instead, it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead, Fig. S5). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state. Interestingly, no such DMN reduction was found during SO-spindle coupling, implying that coupling may involve distinct neural dynamics that partially re-engage DMN-related processes, possibly reflecting memory-related reactivation. Future research using high-temporal-resolution techniques like iEEG could clarify these possibilities.”

      (14) Line 77: "In the current study, while directly capturing hippocampal ripples with scalp EEG or fMRI is difficult, we expect to observe hippocampal activation in fMRI whenever SOs-spindles coupling is detected by EEG, if SOs- spindles-ripples triple coupling occurs during human NREM sleep". Not all SO-spindle events are associated with ripples (Staresina et al., 2015), but hippocampal activation may also be expected based on the occurrence of spindles alone (Bergmann et al., 2012).

      We appreciate your clarification regarding the relationship between SO-spindle coupling and hippocampal ripples. We acknowledge that not all SO-spindle events are necessarily accompanied by ripples (Staresina et al., 2015). However, based on previous research, we found that hippocampal ripples are significantly more likely to occur during SO-spindle coupling events. This suggests that while ripple occurrence is not guaranteed, SO-spindle coupling creates a favorable network state for ripple generation and potential hippocampal activation. To ensure accuracy, we will revise the manuscript to delete this misleading sentence in the Introduction section and acknowledge in the Discussion that our results cannot conclusively directly observe the triple coupling of SO, spindle, and hippocampal ripples.

      Discussion, Page 18 Lines 333-341

      “Despite providing new insights, our study has several limitations. First, our scalp EEG did not directly capture hippocampal ripples, preventing us from conclusively demonstrating triple coupling. Second, the combination of EEG-fMRI and the lack of a memory task limit our ability to parse fine-grained BOLD responses at the DOWN- vs. UP-states of SOs and link observed activations to behavioral outcomes. Third, the use of large anatomical ROIs may mask subregional contributions of specific thalamic nuclei or hippocampal subfields. Finally, without a memory task, we cannot establish a direct behavioral link between sleep-rhythm-locked activation and memory consolidation. Future studies combining techniques such as ultra-high-field fMRI or iEEG with cognitive tasks may refine our understanding of subregional network dynamics and functional significance during sleep.”

      Reviewer #2 (Public review):

      In this study, Wang and colleagues aimed to explore brain-wide activation patterns associated with NREM sleep oscillations, including slow oscillations (SOs), spindles, and SO-spindle coupling events. Their findings reveal that SO-spindle events corresponded with increased activation in both the thalamus and hippocampus. Additionally, they observed that SO-spindle coupling was linked to heightened functional connectivity from the hippocampus to the thalamus, and from the thalamus to the medial prefrontal cortex-three key regions involved in memory consolidation and episodic memory processes.

      This study's findings are timely and highly relevant to the field. The authors' extensive data collection, involving 107 participants sleeping in an fMRI while undergoing simultaneous EEG recording, deserves special recognition. If shared, this unique dataset could lead to further valuable insights. While the conclusions of the data seem overall well supported by the data, some aspects with regard to the detection of sleep oscillations need clarification.

      The authors report that coupled SO-spindle events were most frequent during NREM sleep (2.46 [plus minus] 0.06 events/min), but they also observed a surprisingly high occurrence of these events during N1 and REM sleep (2.23 [plus minus] 0.09 and 2.32 [plus minus] 0.09 events/min, respectively), where SO-spindle coupling would not typically be expected. Combined with the relatively modest SO amplitudes reported (~25 µV, whereas >75 µV would be expected when using mastoids as reference electrodes), this raises the possibility that the parameters used for event detection may not have been conservative enough - or that sleep staging was inaccurately performed. This issue could present a significant challenge, as the fMRI findings are largely dependent on the reliability of these detected events.

      Thank you very much for your thorough and encouraging review. We appreciate your recognition of the significance and relevance of our study and dataset, particularly in highlighting how simultaneous EEG-fMRI recordings can provide complementary insights into the temporal dynamics of neural oscillations and their associated spatial activation patterns during sleep. In the sections that follow, we address each of your comments in detail. We have revised the text and conducted additional analyses wherever possible to strengthen our argument, clarify our methodological choices. We believe these revisions improve the clarity and rigor of our work, and we thank you for helping us refine it.

      We appreciate your insightful comments regarding the detection of sleep oscillations. Our methods for detecting SOs, spindles, and their couplings were originally developed for N2 and N3 sleep data, based on the specific characteristics of these stages. These methods are widely recognized in sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). However, because this percentile-based detection approach will inherently identify a certain number of events if applied to other stages (e.g., N1 and REM), the nature of these events in those stages remains unclear compared to N2/N3. We nevertheless identified and reported the detailed descriptive statistics of these sleep rhythms in all sleep stages, under the same operational definitions, both for completeness and as a sanity check. Within the same subject, there should be more SOs, spindles, and their couplings in N2/N3 than in N1 or REM. We will acknowledge the reasons for these results in the Methods section and emphasize that they are used only for sanity checks.

      Regarding the reported SO amplitudes (~25 µV), during preprocessing, we applied the Signal Space Projection (SSP) method to more effectively remove MRI gradient artifacts and cardiac pulse noise. While this approach enhances data quality, it also reduces overall signal power, leading to systematically lower reported amplitudes. Despite this, our SO detection in NREM sleep (especially N2/N3) remain physiologically meaningful and are consistent with previous fMRI studies using similar artifact removal techniques. We appreciate your careful evaluation and valuable suggestions.

      In addition, we will provide comprehensive tables in the supplementary materials, contains descriptive information about sleep-related characteristics (Table S1), as well as detailed information about sleep waves at each sleep stage for all 107 subjects(Table S2-S4), listing for each subject:(1)Different sleep stage duration; (2)Number of detected SOs; (3)Number of detected spindles; (4)Number of detected SO-spindle coupling events; (2)Density of detected SOs; (3)Density of detected spindles; (4)Density of detected SO-spindle coupling events.

      Methods, Page 25 Lines 515-524

      “We note that the above methods for detecting SOs, spindles, and their couplings were originally developed for N2 and N3 sleep data, based on the specific characteristics of these stages. These methods are widely recognized in sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). However, because this percentile-based detection approach will inherently identify a certain number of events if applied to other stages (e.g., N1 and REM), the nature of these events in those stages remains unclear compared to N2/N3. We nevertheless identified and reported the detailed descriptive statistics of these sleep rhythms in all sleep stages, under the same operational definitions, both for completeness and as a sanity check. Within the same subject, there should be more SOs, spindles, and their couplings in N2/N3 than in N1 or REM (see also Figure S2-S4, Table S1-S4).”

      Supplementary Materials, Page 42-54, Table S1-S4

      Reviewer #3 (Public review):

      Summary:

      Wang et al., examined the brain activity patterns during sleep, especially when locked to those canonical sleep rhythms such as SO, spindle, and their coupling. Analyzing data from a large sample, the authors found significant coupling between spindles and SOs, particularly during the upstate of the SO. Moreover, the authors examined the patterns of whole-brain activity locked to these sleep rhythms. To understand the functional significance of these brain activities, the authors further conducted open-ended cognitive state decoding and found a variety of cognitive processing may be involved during SO-spindle coupling and during other sleep events. The authors next investigated the functional connectivity analyses and found enhanced connectivity between the hippocampus, the thalamus, and the medial PFC. These results reinforced the theoretical model of sleep-dependent memory consolidation, such that SO-spindle coupling is conducive to systems-level memory reactivation and consolidation.

      Strengths:

      There are obvious strengths in this work, including the large sample size, state-of-the-art neuroimaging and neural oscillation analyses, and the richness of results.

      Weaknesses:

      Despite these strengths and the insights gained, there are weaknesses in the design, the analyses, and inferences.

      Thank you for your detailed and thoughtful review of our manuscript. We are delighted that you recognize our advanced analysis methods and rich results of neuroimaging and neural oscillations as well as the large sample size data. In the following sections, we provide detailed responses to each of your comments. And we have revised the text and conducted additional analyses to strengthen our arguments and clarify our methodological choices. We believe these revisions enhance the clarity and rigor of our work, and we sincerely appreciate your thoughtful feedback in helping us refine the manuscript.

      (1) A repeating statement in the manuscript is that brain activity could indicate memory reactivation and thus consolidation. This is indeed a highly relevant question that could be informed by the current data/results. However, an inherent weakness of the design is that there is no memory task before and after sleep. Thus, it is difficult (if not impossible) to make a strong argument linking SO/spindle/coupling-locked brain activity with memory reactivation or consolidation.

      We appreciate your suggestion regarding the lack of a pre- and post-sleep memory task in our study design. We acknowledge that, in the absence of behavioral measures, it is hard to directly link SO-spindle coupling to memory consolidation in an outcome-driven manner. Our interpretation is instead based on the well-established role of these oscillations in memory processes, as demonstrated in previous studies. We sincerely appreciate this feedback and will adjust our Discussion accordingly to reflect a more precise interpretation of our findings.

      Discussion, Page 18 Lines 333-341

      “Despite providing new insights, our study has several limitations. First, our scalp EEG did not directly capture hippocampal ripples, preventing us from conclusively demonstrating triple coupling. Second, the combination of EEG-fMRI and the lack of a memory task limit our ability to parse fine-grained BOLD responses at the DOWN- vs. UP-states of SOs and link observed activations to behavioral outcomes. Third, the use of large anatomical ROIs may mask subregional contributions of specific thalamic nuclei or hippocampal subfields. Finally, without a memory task, we cannot establish a direct behavioral link between sleep-rhythm-locked activation and memory consolidation. Future studies combining techniques such as ultra-high-field fMRI or iEEG with cognitive tasks may refine our understanding of subregional network dynamics and functional significance during sleep.”

      (2) Relatedly, to understand the functional implications of the sleep rhythm-locked brain activity, the authors employed the "open-ended cognitive state decoding" method. While this method is interesting, it is rather indirect given that there were no behavioral indices in the manuscript. Thus, discussions based on these analyses are speculative at best. Please either tone down the language or find additional evidence to support these claims.

      Moreover, the results from this method are difficult to understand. Figure 3e showed that for all three types of sleep events (SO, spindle, SO-spindle), the same mental states (e.g., working memory, episodic memory, declarative memory) showed opposite directions of activation (left and right panels showed negative and positive activation, respectively). How to interpret these conflicting results? This ambiguity is also reflected by the term used: declarative memory and episodic memories are both indexed in the results. Yet these two processes can be largely overlapped. So which specific memory processes do these brain activity patterns reflect? The Discussion shall discuss these results and the limitations of this method.

      We appreciate your critical assessment of the open-ended cognitive state decoding method and its interpretational challenges. Given the concerns about the indirectness of this approach, we decided to remove its related content and results from Figure 3 in the main text and include it in Supplementary Figure 7. 

      Due to the complexity of memory-related processes, we acknowledge that distinguishing between episodic and declarative memory based solely on this approach is not straightforward. We will revise the Supplementary Materials to explicitly discuss these limitations and clarify that our findings do not isolate specific cognitive processes but rather suggest general associations with memory-related networks.

      Discussion, Page 17-18 Lines 323-332

      “To explore functional relevance, we employed an open-ended cognitive state decoding approach using meta-analytic data (NeuroSynth: Yarkoni et al. (2011)). Although this method usefully generates hypotheses about potential cognitive processes, particularly in the absence of a pre- and post-sleep memory task, it is inherently indirect. Many cognitive terms showed significant associations (16 of 50), such as “episodic memory,” “declarative memory,” and “working memory.” We focused on episodic/declarative memory given the known link with hippocampal reactivation (Diekelmann & Born, 2010; Staresina et al., 2015; Staresina et al., 2023). Nonetheless, these inferences regarding memory reactivation should be interpreted cautiously without direct behavioral measures. Future research incorporating explicit tasks before and after sleep would more rigorously validate these potenial functional claims.”

      (3) The coupling strength is somehow inconsistent with prior results (Hahn et al., 2020, eLife, Helfrich et al., 2018, Neuron). Specifically, Helfrich et al. showed that among young adults, the spindle is coupled to the peak of the SO. Here, the authors reported that the spindles were coupled to down-to-up transitions of SO and before the SO peak. It is possible that participants' age may influence the coupling (see Helfrich et al., 2018). Please discuss the findings in the context of previous research on SO-spindle coupling.

      We appreciate your concern regarding the temporal characteristics of SO-spindle coupling. We acknowledge that the SO-spindle coupling phase results in our study are not identical to those reported by Hahn et al. (2020); Helfrich et al. (2018). However, these differences may arise due to slight variations in event detection parameters, which can influence the precise phase estimation of coupling. Notably, Hahn et al. (2020) also reported slight discrepancies in their group-level coupling phase results, highlighting that methodological differences can contribute to variability across studies. Furthermore, our findings are consistent with those of Schreiner et al. (2021), further supporting the robustness of our observations.  

      That said, we acknowledge that our original description of SO-spindle coupling as occurring at the "transition from the lower state to the upper state" was not entirely precise. The -π/2 phase represents the true transition point, while our observed coupling phase is actually closer to the SO peak rather than strictly at the transition. We will revise this statement in the manuscript to ensure clarity and accuracy in describing the coupling phase.  

      Discussion, Page 16 Lines 283-291

      “Our data provide insights into the neurobiological underpinnings of these sleep rhythms. SOs, originating mainly in neocortical areas such as the mPFC, alternate between DOWN- and UP-states. The thalamus generates sleep spindles, which in turn couple with SOs. Our finding that spindle peaks consistently occurred slightly before the UP-state peak of SOs (in 83 out of 107 participants), concurs with prior studies, including Schreiner et al. (2021). Yet it differs from some results suggesting spindles might peak right at the SO UP-state (Hahn et al., 2020; Helfrich et al., 2018). Such discrepancies could arise from differences in detection algorithms, participant age (Helfrich et al., 2018), or subtle variations in cortical-thalamic timing. Nonetheless, these results underscore the importance of coordinated SO-spindle interplay in supporting sleep-dependent processes.”

      (4) The discussion is rather superficial with only two pages, without delving into many important arguments regarding the possible functional significance of these results. For example, the author wrote, "This internal processing contrasts with the brain patterns associated with external tasks, such as working memory." Without any references to working memory, and without delineating why WM is considered as an external task even working memory operations can be internal. Similarly, for the interesting results on SO and reduced DMN activity, the authors wrote "The DMN is typically active during wakeful rest and is associated with self-referential processes like mind-wandering, daydreaming, and task representation (Yeshurun, Nguyen, & Hasson, 2021). Its reduced activity during SOs may signal a shift towards endogenous processes such as memory consolidation." This argument is flawed. DMN is active during self-referential processing and mind-wandering, i.e., when the brain shifts from external stimuli processing to internal mental processing. During sleep, endogenous memory reactivation and consolidation are also part of the internal mental processing given the lack of external environmental stimulation. So why during SO or during memory consolidation, the DMN activity would be reduced? Were there differences in DMN activity between SO and SO-spindle coupling events?

      We appreciate your concerns regarding the brevity of the discussion and the need for clearer theoretical arguments. We will expand this section to provide more in-depth interpretations of our findings in the context of prior literature. Regarding working memory (WM), we acknowledge that our phrasing was ambiguous. We will modify this statement in the Discussion section.

      For the SO-related reduction in DMN activity, we recognize the need for a more precise explanation. This reduced DMN activity may reflect large-scale neural inhibition characteristic of the SO trough. The DMN is typically active during internally oriented cognition (e.g., self-referential processing or mind-wandering) and is suppressed during external stimuli processing (Yeshurun, Nguyen, & Hasson, 2021). It is unlikely, however, that this suppression of DMN during SO events is related to a shift from internal cognition to external responses given it is during deep sleep time. Instead, it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state.

      To address your final question, we have conducted the additional post hoc comparison of DMN activity between isolated SOs and SO-spindle coupling events. Our results indicate that

      DMN activation during SOs was significantly lower than during SO-spindle coupling (t(106) = -4.17, p < 1e-4). This suggests that SO-spindle coupling may involve distinct neural dynamics that partially re-engage DMN-related processes, possibly reflecting memory-related reactivation. We appreciate your constructive feedback and will integrate these expanded analyses and discussions into our revised manuscript.

      Results, Page 11 Lines 199-208

      “Spindles were correlated with positive activation in the thalamus (ROI analysis, t(106) = 15.39, p < 1e-4), the anterior cingulate cortex (ACC), and the putamen, alongside deactivation in the DMN (Fig. 3c). Notably, SO-spindle coupling was linked to significant activation in both the thalamus (ROI analysis, t(106) \= 3.38, p = 0.0005) and the hippocampus (ROI analysis, t(106) \= 2.50, p = 0.0070, Fig. 3d). However, no decrease in DMN activity was found during SO-spindle coupling, and DMN activity during SO was significantly lower than during coupling (ROI analysis, t(106) \= -4.17, p < 1e-4). For more detailed activation patterns, see Table S5-S7. We also varied the threshold used to detect SO events to assess its effect on hippocampal activation during SO-spindle coupling and observed that hippocampal activation remained significant when the percentile thresholds for SO detection ranged between 71% and 80% (see Fig. S6).”

      Discussion, Page 17-18 Lines 308-332

      “An intriguing aspect of our findings is the reduced DMN activity during SOs when modeled at the SO trough (DOWN-state). This reduced DMN activity may reflect large-scale neural inhibition characteristic of the SO trough. The DMN is typically active during internally oriented cognition (e.g., self-referential processing or mind-wandering) and is suppressed during external stimuli processing (Yeshurun, Nguyen, & Hasson, 2021). It is unlikely, however, that this suppression of DMN during SO events is related to a shift from internal cognition to external responses given it is during deep sleep time. Instead, it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead, Fig. S5). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state. Interestingly, no such DMN reduction was found during SO-spindle coupling, implying that coupling may involve distinct neural dynamics that partially re-engage DMN-related processes, possibly reflecting memory-related reactivation. Future research using high-temporal-resolution techniques like iEEG could clarify these possibilities.

      To explore functional relevance, we employed an open-ended cognitive state decoding approach using meta-analytic data (NeuroSynth: Yarkoni et al. (2011)). Although this method usefully generates hypotheses about potential cognitive processes, particularly in the absence of a pre- and post-sleep memory task, it is inherently indirect. Many cognitive terms showed significant associations (16 of 50), such as “episodic memory,” “declarative memory,” and “working memory.” We focused on episodic/declarative memory given the known link with hippocampal reactivation (Diekelmann & Born, 2010; Staresina et al., 2015; Staresina et al., 2023). Nonetheless, these inferences regarding memory reactivation should be interpreted cautiously without direct behavioral measures. Future research incorporating explicit tasks before and after sleep would more rigorously validate these potential functional claims.”

      Recommendations for the authors:

      Reviewing Editor Comment:

      The reviewers think that you are working on a relevant and important topic. They are praising the large sample size used in the study. The reviewers are not all in line regarding the overall significance of the findings, but they all agree the paper would strongly benefit from some extra work, as all reviewers raise various critical points that need serious consideration.

      We appreciate your recognition of the relevance and importance of our study, as well as your acknowledgment of the large sample size as a strength of our work. We understand that there are differing perspectives regarding the overall significance of our findings, and we value the constructive critiques provided. We are committed to addressing the key concerns raised by all reviewers, including refining our analyses, clarifying our interpretations, and incorporating additional discussions to strengthen the manuscript. Below, we address your specific recommendations and provide responses to each point you raised to ensure our methods and results are as transparent and comprehensible as possible. We believe that these revisions will significantly enhance the rigor and impact of our study, and we sincerely appreciate your thoughtful feedback in helping us improve our work.

      Reviewer #1 (Recommendations for the authors):

      (1) The phrase "overnight sleep" suggests an entire night, while these were rather "nocturnal naps". Please rephrase.

      Response: Thank you for pointing this out. We have revised the phrasing in our manuscript to "nocturnal naps" instead of "overnight sleep" to more accurately reflect the duration of the sleep recordings.

      (2) Sleep staging results (macroscopic sleep architecture) should be provided in more detail (at least min and % of the different sleep stages, sleep onset latency, total sleep duration, total recording duration), at least mean/SD/range.

      Thank you for this suggestion. We will provide comprehensive tables in the supplementary materials, contains descriptive information about sleep-related characteristics. This information will help provide a clearer overview of the macroscopic sleep architecture in our dataset.

      Reviewer #2 (Recommendations for the authors):

      In order to allow for a better estimation of the reliability of the detected sleep events, please:

      (1) Provide densities and absolute numbers of all detected SOs and spindles (N1, NREM, and REM sleep).

      Thank you for pointing this out. We will provide comprehensive tables in the supplementary materials, contains detailed information about sleep waves at each sleep stage for all 107 subjects (Table S2-S4), listing for each subject:1) Different sleep stage duration; 2) Number of detected SOs; 3) Number of detected spindles; 4) Number of detected SO-spindle coupling events; 5) Density of detected SOs; 6) Density of detected spindles; 7) Density of detected SO-spindle coupling events.

      Supplementary Materials, Page 43-54, Table S2-S4

      (2) Show ERPs for all detected SOs and spindles (per sleep stage).

      Thank you for the suggestion. We will provide ERPs for all detected SOs and spindles, separated by sleep stage (N1, N2&N3, and REM) in supplementary Fig. S2-S4. These ERP waveforms will help illustrate the characteristic temporal profiles of SOs and spindles across different sleep stages.

      Methods, Page 25, Line 525-532

      “Event-related potentials (ERP) analysis. After completing the detection of each sleep rhythm event, we performed ERP analyses for SOs, spindles, and coupling events in different sleep stages. Specifically, for SO events, we took the trough of the DOWN-state of each SO as the zero-time point, then extracted data in a [-2 s to 2 s] window from the broadband (0.1–30 Hz) EEG and used [-2 s to -0.5 s] for baseline correction; the results were then averaged across 107 subjects (see Fig. S2a). For spindle events, we used the peak of each spindle as the zero-time point and applied the same data extraction window and baseline correction before averaging across 107 subjects (see Fig. S2b). Finally, for SO-spindle coupling events, we followed the same procedure used for SO events (see Fig. 2a, Figs. S3–S4).”

      (3) Provide detailed info concerning sleep characteristics (time spent in each sleep stage etc.).

      Thank you for this suggestion. Same as the response above, we will provide comprehensive tables in the supplementary materials, contains descriptive information about sleep-related characteristics.

      Supplementary Materials, Page 42, Table S1 (same as above)

      (4) What would happen if more stringent parameters were used for event detection? Would the authors still observe a significant number of SO spindles during N1 and REM? Would this affect the fMRI-related results?

      Thank you for this suggestion. Our methods for detecting SOs, spindles, and their couplings were originally developed for N2 and N3 sleep data, based on the specific characteristics of these stages. These methods are widely recognized in sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). However, because this percentile-based detection approach will inherently identify a certain number of events if applied to other stages (e.g., N1 and REM), the nature of these events in those stages remains unclear compared to N2/N3. We nevertheless identified and reported the detailed descriptive statistics of these sleep rhythms in all sleep stages, under the same operational definitions, both for completeness and as a sanity check. Within the same subject, there should be more SOs, spindles, and their couplings in N2/N3 than in N1 or REM (see also Figure S2-S4, Table S1-S4).

      Furthermore, in order to explore the impact of this on our fMRI results, we conducted an additional sensitivity analysis by applying different detection parameters for SOs. Specifically, we adjusted amplitude percentile thresholds for SO detection (the parameter that has the greatest impact on the results). We used the hippocampal activation value during N2&N3 stage SO-spindle coupling as an anchor value and found that when the parameters gradually became stricter, the results were similar to or even better than the current results. However, when we continued to increase the threshold, the results began to gradually decrease until the threshold was increased to 80%, and the results were no longer significant. This indicates that our results are robust within a specific range of parameters, but as the threshold increases, the number of trials decreases, ultimately weakening the statistical power of the fMRI analysis.

      Thank you again for your suggestions on sleep rhythm event detection. We will add the results in Supplementary and revise our manuscript accordingly.

      Results, Page 11, Line 199-208

      “Spindles were correlated with positive activation in the thalamus (ROI analysis, t(106) = 15.39, p < 1e-4), the anterior cingulate cortex (ACC), and the putamen, alongside deactivation in the DMN (Fig. 3c). Notably, SO-spindle coupling was linked to significant activation in both the thalamus (ROI analysis, t(106) \= 3.38, p = 0.0005) and the hippocampus (ROI analysis, t(106) \= 2.50, p = 0.0070, Fig. 3d). However, no decrease in DMN activity was found during SO-spindle coupling, and DMN activity during SO was significantly lower than during coupling (ROI analysis, t(106) \= -4.17, p < 1e-4). For more detailed activation patterns, see Table S5-S7. We also varied the threshold used to detect SO events to assess its effect on hippocampal activation during SO-spindle coupling and observed that hippocampal activation remained significant when the percentile thresholds for SO detection ranged between 71% and 80% (see Fig. S6).”

      Finally, we sincerely thank all again for your thoughtful and constructive feedback. Your insights have been invaluable in refining our analyses, strengthening our interpretations, and improving the clarity and rigor of our manuscript. We appreciate the time and effort you have dedicated to reviewing our work, and we are grateful for the opportunity to enhance our study based on your recommendations.  

      References:

      Bergmann, T. O., Mölle, M., Diedrichs, J., Born, J., & Siebner, H. R. (2012). Sleep spindle-related reactivation of category-specific cortical regions after learning face-scene associations. NeuroImage, 59(3), 2733-2742. 

      Buzsáki, G. (2015). Hippocampal sharp wave‐ripple: A cognitive biomarker for episodic memory and planning. Hippocampus, 25(10), 1073-1188. 

      Caporro, M., Haneef, Z., Yeh, H. J., Lenartowicz, A., Buttinelli, C., Parvizi, J., & Stern, J. M. (2012). Functional MRI of sleep spindles and K-complexes. Clinical neurophysiology, 123(2), 303-309. 

      Coulon, P., Budde, T., & Pape, H.-C. (2012). The sleep relay—the role of the thalamus in central and decentral sleep regulation. Pflügers Archiv-European Journal of Physiology, 463, 53-71. 

      Crunelli, V., Lőrincz, M. L., Connelly, W. M., David, F., Hughes, S. W., Lambert, R. C., Leresche, N., & Errington, A. C. (2018). Dual function of thalamic low-vigilance state oscillations: rhythm-regulation and plasticity. Nature Reviews Neuroscience, 19(2), 107-118. 

      Czisch, M., Wehrle, R., Stiegler, A., Peters, H., Andrade, K., Holsboer, F., & Sämann, P. G. (2009). Acoustic oddball during NREM sleep: a combined EEG/fMRI study. PloS one, 4(8), e6749. 

      Diba, K., & Buzsáki, G. (2007). Forward and reverse hippocampal place-cell sequences during ripples. Nature Neuroscience, 10(10), 1241. 

      Diekelmann, S., & Born, J. (2010). The memory function of sleep. Nature Reviews Neuroscience, 11(2), 114-126. 

      Fogel, S., Albouy, G., King, B. R., Lungu, O., Vien, C., Bore, A., Pinsard, B., Benali, H., Carrier, J., & Doyon, J. (2017). Reactivation or transformation? Motor memory consolidation associated with cerebral activation time-locked to sleep spindles. PloS one, 12(4), e0174755. 

      Hahn, M. A., Heib, D., Schabus, M., Hoedlmoser, K., & Helfrich, R. F. (2020). Slow oscillation-spindle coupling predicts enhanced memory formation from childhood to adolescence. Elife, 9, e53730. 

      Halassa, M. M., Siegle, J. H., Ritt, J. T., Ting, J. T., Feng, G., & Moore, C. I. (2011). Selective optical drive of thalamic reticular nucleus generates thalamic bursts and cortical spindles. Nature Neuroscience, 14(9), 1118-1120. 

      Hale, J. R., White, T. P., Mayhew, S. D., Wilson, R. S., Rollings, D. T., Khalsa, S., Arvanitis, T. N., & Bagshaw, A. P. (2016). Altered thalamocortical and intra-thalamic functional connectivity during light sleep compared with wake. NeuroImage, 125, 657-667. 

      Helfrich, R. F., Lendner, J. D., Mander, B. A., Guillen, H., Paff, M., Mnatsakanyan, L., Vadera, S., Walker, M. P., Lin, J. J., & Knight, R. T. (2019). Bidirectional prefrontal-hippocampal dynamics organize information transfer during sleep in humans. Nature Communications, 10(1), 3572. 

      Helfrich, R. F., Mander, B. A., Jagust, W. J., Knight, R. T., & Walker, M. P. (2018). Old brains come uncoupled in sleep: slow wave-spindle synchrony, brain atrophy, and forgetting. Neuron, 97(1), 221-230. e224. 

      Horovitz, S. G., Fukunaga, M., de Zwart, J. A., van Gelderen, P., Fulton, S. C., Balkin, T. J., & Duyn, J. H. (2008). Low frequency BOLD fluctuations during resting wakefulness and light sleep: A simultaneous EEG‐fMRI study. Human brain mapping, 29(6), 671-682. 

      Huang, Q., Xiao, Z., Yu, Q., Luo, Y., Xu, J., Qu, Y., Dolan, R., Behrens, T., & Liu, Y. (2024). Replay-triggered brain-wide activation in humans. Nature Communications, 15(1), 7185. 

      Ilhan-Bayrakcı, M., Cabral-Calderin, Y., Bergmann, T. O., Tüscher, O., & Stroh, A. (2022). Individual slow wave events give rise to macroscopic fMRI signatures and drive the strength of the BOLD signal in human resting-state EEG-fMRI recordings. Cerebral Cortex, 32(21), 4782-4796. 

      Laufs, H. (2008). Endogenous brain oscillations and related networks detected by surface EEG‐combined fMRI. Human brain mapping, 29(7), 762-769. 

      Laufs, H., Walker, M. C., & Lund, T. E. (2007). ‘Brain activation and hypothalamic functional connectivity during human non-rapid eye movement sleep: an EEG/fMRI study’—its limitations and an alternative approach. Brain, 130(7), e75. 

      Margulies, D. S., Ghosh, S. S., Goulas, A., Falkiewicz, M., Huntenburg, J. M., Langs, G., Bezgin, G., Eickhoff, S. B., Castellanos, F. X., & Petrides, M. (2016). Situating the default-mode network along a principal gradient of macroscale cortical organization. Proceedings of the National Academy of Sciences, 113(44), 12574-12579. 

      Massimini, M., Huber, R., Ferrarelli, F., Hill, S., & Tononi, G. (2004). The sleep slow oscillation as a traveling wave. Journal of Neuroscience, 24(31), 6862-6870. 

      Moehlman, T. M., de Zwart, J. A., Chappel-Farley, M. G., Liu, X., McClain, I. B., Chang, C., Mandelkow, H., Özbay, P. S., Johnson, N. L., & Bieber, R. E. (2019). All-night functional magnetic resonance imaging sleep studies. Journal of neuroscience methods, 316, 83-98. 

      Molle, M., Bergmann, T. O., Marshall, L., & Born, J. (2011). Fast and slow spindles during the sleep slow oscillation: disparate coalescence and engagement in memory processing. Sleep, 34(10), 1411-1421. 

      Ngo, H.-V., Fell, J., & Staresina, B. (2020). Sleep spindles mediate hippocampal-neocortical coupling during long-duration ripples. Elife, 9, e57011. 

      Picchioni, D., Horovitz, S. G., Fukunaga, M., Carr, W. S., Meltzer, J. A., Balkin, T. J., Duyn, J. H., & Braun, A. R. (2011). Infraslow EEG oscillations organize large-scale cortical– subcortical interactions during sleep: a combined EEG/fMRI study. Brain research, 1374, 63-72. 

      Schabus, M., Dang-Vu, T. T., Albouy, G., Balteau, E., Boly, M., Carrier, J., Darsaud, A., Degueldre, C., Desseilles, M., & Gais, S. (2007). Hemodynamic cerebral correlates of sleep spindles during human non-rapid eye movement sleep. Proceedings of the National Academy of Sciences, 104(32), 13164-13169. 

      Schreiner, T., Kaufmann, E., Noachtar, S., Mehrkens, J.-H., & Staudigl, T. (2022). The human thalamus orchestrates neocortical oscillations during NREM sleep. Nature communications, 13(1), 5231. 

      Schreiner, T., Petzka, M., Staudigl, T., & Staresina, B. P. (2021). Endogenous memory reactivation during sleep in humans is clocked by slow oscillation-spindle complexes. Nature Communications, 12(1), 3112. 

      Singh, D., Norman, K. A., & Schapiro, A. C. (2022). A model of autonomous interactions between hippocampus and neocortex driving sleep-dependent memory consolidation. Proceedings of the National Academy of Sciences, 119(44), e2123432119. 

      Spoormaker, V. I., Schröter, M. S., Gleiser, P. M., Andrade, K. C., Dresler, M., Wehrle, R., Sämann, P. G., & Czisch, M. (2010). Development of a large-scale functional brain network during human non-rapid eye movement sleep. Journal of Neuroscience, 30(34), 11379-11387. 

      Staresina, B. P., Bergmann, T. O., Bonnefond, M., van der Meij, R., Jensen, O., Deuker, L., Elger, C. E., Axmacher, N., & Fell, J. (2015). Hierarchical nesting of slow oscillations, spindles and ripples in the human hippocampus during sleep. Nature Neuroscience, 18(11), 1679-1686. 

      Staresina, B. P., Niediek, J., Borger, V., Surges, R., & Mormann, F. (2023). How coupled slow oscillations, spindles and ripples coordinate neuronal processing and communication during human sleep. Nature Neuroscience, 1-9. 

      Yarkoni, T., Poldrack, R. A., Nichols, T. E., Van Essen, D. C., & Wager, T. D. (2011). Large-scale automated synthesis of human functional neuroimaging data. Nature methods, 8(8), 665-670. 

      Yeshurun, Y., Nguyen, M., & Hasson, U. (2021). The default mode network: where the idiosyncratic self meets the shared social world. Nature Reviews Neuroscience, 1-12.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors have used full-length single-cell sequencing on a sorted population of human fetal retina to delineate expression patterns associated with the progression of progenitors to rod and cone photoreceptors. They find that rod and cone precursors contain a mix of rod/cone determinants, with a bias in both amounts and isoform balance likely deciding the ultimate cell fate. Markers of early rod/cone hybrids are clarified, and a gradient of lncRNAs is uncovered in maturing cones. Comparison of early rods and cones exposes an enriched MYCN regulon, as well as expression of SYK, which may contribute to tumor initiation in RB1 deficient cone precursors.

      Strengths:

      (1) The insight into how cone and rod transcripts are mixed together at first is important and clarifies a long-standing notion in the field.

      (2) The discovery of distinct active vs inactive mRNA isoforms for rod and cone determinants is crucial to understanding how cells make the decision to form one or the other cell type. This is only really possible with full-length scRNAseq analysis.

      (3) New markers of subpopulations are also uncovered, such as CHRNA1 in rod/cone hybrids that seem to give rise to either rods or cones.

      (4) Regulon analyses provide insight into key transcription factor programs linked to rod or cone fates.

      (5) The gradient of lncRNAs in maturing cones is novel, and while the functional significance is unclear, it opens up a new line of questioning around photoreceptor maturation.

      (6) The finding that SYK mRNA is naturally expressed in cone precursors is novel, as previously it was assumed that SYK expression required epigenetic rewiring in tumors.

      We thank the reviewer for describing the study’s strengths, reflecting the major conclusions of the initially submitted manuscript.  However, based on new analyses – including the requested analyses of other scRNA-seq datasets, our revision clarifies that:

      -  related to point (1), cone and rod transcripts do not appear to be mixed together at first (i.e., in immediately post-mitotic immature cone and rod precursors) but appear to be coexpressed in subsequent cone and rod precursor stages; and 

      - related to point (3), CHRNA1 appears to mark immature cone precursors that are distinct from the maturing cone and rod precursors that co-express cone- and rod-related RNAs (despite the similar UMAP positions of the two populations in our dataset). 

      Weaknesses:

      (1) The writing is very difficult to follow. The nomenclature is confusing and there are contradictory statements that need to be clarified.

      (2) The drug data is not enough to conclude that SYK inhibition is sufficient to prevent the division of RB1 null cone precursors. Drugs are never completely specific so validation is critical to make the conclusion drawn in the paper.

      We thank the reviewer for noting these important issues. Accordingly, in the revised manuscript:

      (1) We improve the writing and clarify the nomenclature and contradictory statements, particularly those noted in the Reviewer’s Recommendations for Authors. 

      (2) We scale back claims related to the role of SYK in the cone precursor response to RB1 loss, with wording changes in the Abstract, Results, and Discussion, which now recognize that the inhibitor studies only support the possibility that cone-intrinsic SYK expression contributes to retinoblastoma initiation, as detailed in our responses to Reviewer’s Recommendations for Authors. We agree and now mention that genetic perturbation of SYK is required to prove its role.  

      Reviewer #2 (Public review):

      Summary:

      The authors used deep full-length single-cell sequencing to study human photoreceptor development, with a particular emphasis on the characteristics of photoreceptors that may contribute to retinoblastoma.

      Strengths:

      This single-cell study captures gene regulation in photoreceptors across different developmental stages, defining post-mitotic cone and rod populations by highlighting their unique gene expression profiles through analyses such as RNA velocity and SCENIC. By leveraging fulllength sequencing data, the study identifies differentially expressed isoforms of NRL and THRB in L/M cone and rod precursors, illustrating the dynamic gene regulation involved in photoreceptor fate commitment. Additionally, the authors performed high-resolution clustering to explore markers defining developing photoreceptors across the fovea and peripheral retina, particularly characterizing SYK's role in the proliferative response of cones in the RB loss background. The study provides an in-depth analysis of developing human photoreceptors, with the authors conducting thorough analyses using full-length single-cell RNA sequencing. The strength of the study lies in its design, which integrates single-cell full-length RNA-seq, longread RNA-seq, and follow-up histological and functional experiments to provide compelling evidence supporting their conclusions. The model of cell type-dependent splicing for NRL and THRB is particularly intriguing. Moreover, the potential involvement of the SYK and MYC pathways with RB in cone progenitor cells aligns with previous literature, offering additional insights into RB development.

      We thank the reviewer for summarizing the main findings and noting the compelling support for the conclusions, the intriguing cell type-dependent splicing of rod and cone lineage factors, and the insights into retinoblastoma development.  

      Weaknesses:

      The manuscript feels somewhat unfocused, with a lack of a strong connection between the analysis of developing photoreceptors, which constitutes the bulk of the manuscript, and the discussion on retinoblastoma. Additionally, given the recent publication of several single-cell studies on the developing human retina, it is important for the authors to cross-validate their findings and adjust their statements where appropriate.

      We agree that the manuscript covers a range of topics resulting from the full-length scRNAseq analyses and concur that some studies of developing photoreceptors were not well connected to retinoblastoma. However, we also note that the connection to retinoblastoma is emphasized in several places in the Introduction and throughout the manuscript and was a significant motivation for pursuing the analyses. We suggest that it was valuable to highlight how deep, fulllength scRNA-seq of developing retina provides insights into retinoblastoma, including i) the similar biased expression of NRL transcript isoforms in cone precursors and RB tumors, ii) the cone precursors’ co-expression of rod- and cone-related genes such as NR2E3 and GNAT2, which may explain similar co-expression in RB cells, and iii) the expression of  SYK in early cones and RB cells.  While the earlier version had mainly highlighted point (iii), the revised Discussion further refers to points (i) and (ii) as described further in the response to the Reviewer’s Recommendations for Authors. 

      We address the Reviewer’s request to cross-validate our findings with those of other single-cell studies of developing human retina by relating the different photoreceptor-related cell populations identified in our study to those characterized by Zuo et al (PMID 39117640), which was specifically highlighted by the reviewer and is especially useful for such cross-validation given the extraordinarily large ~ 220,000 cell dataset covering a wide range of retinal ages (pcw 8–23) and spatiotemporally stratified by macular or peripheral retina location. Relevant analyses of the Zuo et al dataset are shown in Supplementary Figures S3G-H, S10B, S11A-F, and S13A,B. 

      Reviewer #3 (Public review):

      Summary:

      The authors use high-depth, full-length scRNA-Seq analysis of fetal human retina to identify novel regulators of photoreceptor specification and retinoblastoma progression.

      Strengths:

      The use of high-depth, full-length scRNA-Seq to identify functionally important alternatively spliced variants of transcription factors controlling photoreceptor subtype specification, and identification of SYK as a potential mediator of RB1-dependent cell cycle reentry in immature cone photoreceptors.

      Human developing fetal retinal tissue samples were collected between 13-19 gestational weeks and this provides a substantially higher depth of sequencing coverage, thereby identifying both rare transcripts and alternative splice forms, and thereby representing an important advance over previous droplet-based scRNA-Seq studies of human retinal development.

      Weaknesses:

      The weaknesses identified are relatively minor. This is a technically strong and thorough study, that is broadly useful to investigators studying retinal development and retinoblastoma.

      We thank the reviewer for describing the strengths of the study. Our revision addresses the concerns raised separately in the Reviewer’s Recommendations for Authors, as detailed in the responses below.  

      Recommendations for the authors:

      Reviewing Editor Comments:

      The reviewers have completed their reviews. Generally, they note that your work is important and that the evidence is generally convincing. The reviewers are in general agreement that the paper adds to the field. The findings of rod/cone fate determination at a very early stage are intriguing. Generally, the paper would benefit from clarifications in the writing and figures. Experimentally, the paper would benefit from validation of the drug data, for example using RNAi or another assay. Alternatively, the authors could note the caveats of the drug experiments and describe how they could be improved. In terms of analysis, the paper would be improved by additional comparisons of the authors' data to previously published datasets.

      We thank the reviewing editor for this summary. As described in the individual reviewer responses, we clarify the writing and figures and provide comparisons to previously published datasets (in particular, the large snRNA-seq dataset of Zuo et al., 2024 (PMID 39117640).  With regard to the drug (i.e., SYK inhibitor) studies, we opted to provide caveats and describe the need for genetic approaches to validate the role of SYK, owing to the infeasibility of completing genetic perturbation experiments in the appropriate timeframe.  We are grateful for the opportunity to present our findings with appropriate caveats. 

      Reviewer #1 (Recommendations for the authors):

      Shayler cell sort human progenitor/rod/cone populations then full-length single cell RNAseq to expose features that distinguish paths towards rods or cones. They initially distinguish progenitors (RPCs), immature photoreceptor precursors (iPRPs), long/medium wavelength (LM) cones, late-LM cones, short wavelength (S) cones, early rods (ER) and late rods (LR), which exhibit distinct transcription factor regulons (Figures 1, 2). These data expose expected and novel enriched genes, and support the notion that S cones are a default state lacking expression of rod (NRL) or cone (THRB) determinants but retaining expression of generic photoreceptor drivers (CRX/OTX2/NEUROD1 regulons). They identify changes in regulon activity, such as increasing NRL activity from iPRP to ER to LR, but decreasing from iPRP to cones, or increasing RAX/ISL2/THRB regulon activity from iPRP to LM cones, but decreasing from iPRP to S cones or rods.

      They report co-expression of rod/cone determinants in LM and ER clusters, and the ratios are in the expected directions (NRLTHRB or RXRG in ER). A novel insight from the FL seq is that there are differing variants generated in each cell population. Full-length NRL (FL-NRL) predominates in the rod path, whereas truncated NRL (Tr-NRL) does so in the cone path, then similar (but opposite) findings are presented for THRB (Fig 3, 4), whereas isoforms are not a feature of RXRG expression, just the higher expression in cones.

      The authors then further subcluster and perform RNA velocity to uncover decision points in the tree (Figure 5). They identify two photoreceptor precursor streams, the Transitional Rods (TRs) that provide one source for rod maturation and (reusing the name from the initial clustering) iPRPs that form cones, but also provide a second route to rods. TR cells closest to RPCs (immediately post-mitotic) have higher levels of the rod determinant NR2E3 and NRL, whereas the higher resolution iPRPs near RPCs lack NR2E3 and have higher levels of ONECUT1, THRB, and GNAT2, a cone bias. These distinct rod-biased TR and cone-biased high-resolution iPRPs were not evident in published scRNAseq with 3′ end-counting (i.e. not FL seq). Regulon analysis confirmed higher NRL activity in TR cells, with higher THRB activity in highresolution iPRP cells.

      Many of the more mature high-resolution iPRPs show combinations of rod (GNAT1, NR2E3) and cone (GNAT2, THRB) paths as well as both NRL and THRB regulons, but with a bias towards cone-ness (Figure 6). Combined FISH/immunofluorescence in fetal retina uncovers cone-biased RXRG-protein-high/NR2E3-protein-absent cone-fated cells that nevertheless expressed NR2E3 mRNA. Thus early cone-biased iPRP cells express rod gene mRNA, implying a rod-cone hybrid in early photoreceptor development. The authors refer to these as "bridge region iPRP cells".

      In Figure 7, they identify CHRNA1 as the most specific marker of these bridge cells (overlapping with ATOH7 and DLL3, previously linked to cone-biased precursors), and FISH shows it is expressed in rod-biased NRL protein-positive and cone-biased RXRG proteinpositive cones at fetal week 12.

      Figure 8 outlines the graded expression of various lncRNAs during cone maturation, a novel pattern.

      Finally (Figure 9), the authors identify differential genes expressed in early rods (ER cluster from Figure 1) vs early cones (LM cluster, excluding the most mature opsin+ cells), revealing high levels of MYCN targets in cones. They also find SYK expression in cones. SYK was previously linked to retinoblastoma, so intrinsic expression may predispose cone precursors to transformation upon RB loss. They finish by showing that a SYK inhibitor blocks the proliferation of dividing RB1 knockdown cone precursors in the human fetal retina.

      Overall, the authors have uncovered interesting patterns of biased expression in cone/rod developmental paths, especially relating to the isoform differences for NRL and THRB which add a new layer to our understanding of this fate choice. The analyses also imply that very soon after RPCs exit the cell cycle, they generate post-mitotic precursors biased towards a rod or cone fate, that carry varying proportions of mixed rod/cone determinants and other rod/cone marker genes. They also introduce new markers that may tag key populations of cells that precede the final rod/cone choice (e.g. CHRNA1), catalogue a new lncRNA gradient in cone maturation, and provide insight into potential genes that may contribute to retinoblastoma initiation, like SYK, due to intrinsic expression in cone precursors. However, as detailed below, the text needs to be improved considerably, and overinterpretations need to be moderated, removed, or tested more rigorously with extra data.

      Major Comments

      The manuscript is very difficult to follow. The nomenclature is at times torturous, and the description of hybrid rod/cone hybrid cells is confusing in many aspects.

      (1) A single term, iPRP, is used to refer to an initial low-resolution cluster, and then to a subset of that cluster later in the paper.

      We agree that using immature photoreceptor precursor (iPRP) for both high-resolution and lowresolution clusters was confusing. We kept this name for the low-resolution cluster (which includes both immature cone and immature rod precursors), renamed the high-resolution iPRP cluster immature cone precursors (iCPs). and renamed their transitional rod (TR) counterparts immature rod precursors (iRPs). These designations are based on 

      - the biased expression of THRB, ONECUT1, and the THRB regulon in iCPs (Fig. 5D,E);

      - the biased expression of NRL, NR2E3, and NRL regulon iRPs (Fig. 5D,E);

      - the partially distinct iCP and iRP UMAP positions (Figure 5C); and 

      - the evidence of similar immature cone versus rod precursor populations in the Zuo et al 3’ snRNA-seq dataset, as noted below and described in two new paragraphs starting at the bottom of p. 12.

      (2) To complicate matters further, the reader needs to understand the subset within the iPRP referred to as bridge cells, and we are told at one point that the earliest iPRPs lack NR2E3, then that they later co-express NR2E3, and while the authors may be referring to protein and RNA, it serves to further confuse an already difficult to follow distinction. I had to read and re-read the iPRP data many times, but it never really became totally clear.

      We agree that the description of the high-resolution iPRP (now “iCP”) subsets was unclear, although our further analyses of a large 3’ snRNA-seq dataset in Figure S11 support the impression given in the original manuscript that the earliest iCPs lack NR2E3 and then later coexpress NR2E3 while the earliest iRPs lack THRB and then later express THRB. As described in new text in the Two post-mitotic immature photoreceptor precursor populations section (starting on line 7 of p. 13): 

      When considering only the main cone and rod precursor UMAP regions, early (pcw 8 – 13) cone precursors expressed THRB and lacked NR2E3 (Figure S11D,E, blue arrows), while early (pcw 10 – 15) rod precursors expressed NR2E3 and lacked THRB (Figure S11D,E, red arrows), similar to RPC-localized iCPs and iRPs in our study (Figure 5D).

      Next, as summarized in new text in the Early cone and rod precursors with rod- and conerelated RNA co-expression section (new paragraph at top of p. 16): 

      Thus, a 3’ snRNA-seq analysis confirmed the initial production of immature photoreceptor precursors with either L/M cone-precursor-specific THRB or rod-precursor-specific NR2E3 expression, followed by lower-level co-expression of their counterparts, NR2E3 in cone precursors and THRB in rod precursors. However, in the Zuo et al. analyses, the co-expression was first observed in well-separated UMAP regions, as opposed to a region that bridges the early cone and early rod populations in our UMAP plots. These findings are consistent with the notion that cone- and rod-related RNA co-expression begins in already fate-determined cone and rod precursors, and that such precursors aberrantly intermixed in our UMAP bridge region due to their insufficient representation in our dataset.  

      Importantly, and as noted in our ‘Public response’ to Reviewer 1, “CHRNA1 appears to mark immature cone precursors that are distinct from the maturing cone and rod precursors that coexpress cone- and rod-related RNAs (despite the similar UMAP positions of the two populations in our dataset).” In support of this notion, the immature cone precursors expressing CHRNA1  and other  populations did not overlap in UMAP space in the Zuo et al dataset. We hope the new text cited above along with other changes will significantly clarify the observations.

      (3) The term "cone/rod precursor" shows up late in the paper (page 12), but it was clear (was it not?) much earlier in this manuscript that cone and rod genes are co-expressed because of the coexpressed NRL and THRB isoforms in Figures 3/4.

      We thank the reviewer for noting that the differential NRL and THRB isoform expression already implies that cone and rod genes are co-expressed. However, as we now state, the co-expression of RNAs encoding an additional cone marker (GNAT2) and rod markers (GNAT1, NR2E3) was 

      “suggestive of a proposed hybrid cone/rod precursor state more extensive than implied by the coexpression of different THRB and NRL isoforms” (first paragraph of “Early cone and rod …” section on p. 14; new text underlined). 

      (4) The (incorrect) impression given later in the manuscript is that the rod/cone transcript mixture applies to just a subset of the iPRP cells, or maybe just the bridge cells (writing is not clear), but actually, neither of those is correct as the more abundant and more mature LM and ER populations analyzed earlier coexpress NRL and THRB mRNAs (Figures 2, 3). Overall, the authors need to vastly improve the writing, simplify/clarify the nomenclature, and better label figures to match the text and help the reader follow more easily and clearly. As it stands, it is, at best, obtuse, and at worst, totally confusing.

      We thank the reviewer for bringing the extent of the confusing terminology and wording to our attention. We revised the terminology (as in our response to point 1) and extensively revised the text.  We also performed similar analyses of the Zuo et al. data (as described in more detail in our response to Reviewer 2), which clarifies the distinct status of cells with the “rod/cone transcript mixture” and cells co-expressing early cone and rod precursor markers.  

      To more clearly describe data related to cells with rod- and cone-related RNA co-expression, we divided the former Figure 6 into two figures, with Figure 6 now showing the cone- and rodrelated RNA co-expression inferred from scRNA-seq and Figure 7 showing GNAT2 and NR2E3 co-expression in FISH analyses of human retina plus a new schematic in the new panel 7E.

      To separate the conceptually distinct analyses of cone and rod related RNA co-expression and the expression of early photoreceptor precursor markers (which were both found in the so-called bridge region – yet now recognized to be different subpopulations), we separated the analyses of the early photoreceptor precursor markers to form a new section, “Developmental expression of photoreceptor precursor markers and fate determinants,” starting on p. 16. 

      Additionally, we further review the findings and their implications in four revised Discussion paragraphs starting at the bottom of p. 23).

      (5) The data showing that overexpressing Tr-NRL in murine NIH3T3 fibroblasts blocks FL-NRL function is presented at the end of page 7 and in Figure 3G. Subsequent analysis two paragraphs and two figures later (end page 8, Figure 5C + supp figs) reveal that Tr-NRL protein is not detectable in retinoblastoma cells which derive from cone precursors cells and express Tr-NRL mRNA, and the protein is also not detected upon lentiviral expression of Tr-NRL in human fetal retinal explants, suggesting it is unstable or not translated. It would be preferable to have the 3T3 data and retinoblastoma/explant data juxtaposed. E.g. they could present the latter, then show the 3T3 that even if it were expressed (e.g. briefly) it would interfere with FL-NRL. The current order and spacing are somewhat confusing.

      We thank the reviewer for this suggestion and moved the description of the luciferase assays to follow the retinoblastoma and explant data and switched the order of Figure panels 3G and 3H.  

      (6) On page 15, regarding early rod vs early cone gene expression, the authors state: "although MYCN mRNA was not detected....", yet on the volcano plot in Figure S14A MYCN is one of the marked genes that is higher in cones than rods, meaning it was detected, and a couple of sentences later: "Concordantly, the LM cluster had increased MYCN RNA". The text is thus confusing.

      With respect, we note that the original text read, “although MYC RNA was not detected,” which related to a statement in the previous sentence that the gene ontology analysis identified “MYC targets.” However, given that this distinction is subtle and may be difficult for readers to recognize, we revised the text (now on p. 19) to more clearly describe expression of MYCN (but not MYC) as follows:

      “The upregulation of MYC target genes was of interest given that many MYC target genes are also targets of MYCN, that MYCN protein is highly expressed in maturing (ARR3+) cone precursors but not in NRL+ rods (Figure 10A), and that MYCN is critical to the cone precursor proliferative response to pRB loss8–10.  Indeed, whereas MYC RNA was not detected, the LM cone cluster had increased MYCN RNA …”

      (7) The authors state that the SYK drug is "highly specific". They provide no evidence, but no drug is 100% specific, and it is possible that off-target hits are important for the drug phenotype. This data should be removed or validated by co-targeting the SYK gene along with RB1.

      We agree that our data only show the potential for SYK to contribute to the cone proliferative response; however, we believe the inhibitor study retains value in that a negative result (no effect of the SYK inhibitor) would disprove its potential involvement. To reflect this, we changed wording related to this experiment as follows:

      In the Abstract, we changed:

      (1) “SYK, which contributed to the early cone precursors’ proliferative response to RB1 loss” To: “SYK, which was implicated in the early cone precursors’ proliferative response to RB1 loss.”  

      (2) “These findings reveal … and a role for early cone-precursor-intrinsic SYK expression.” To:  “These findings reveal … and suggest a role for early cone-precursor-intrinsic SYK expression.”

      In the last paragraph of the Results, we changed:

      (1) “To determine if SYK contributes…” To:  “To determine if SYK might contribute…”

      (2) “the highly specific SYK inhibitor” To:  “the selective SYK inhibitor”  

      (3)  “indicating that cone precursor intrinsic SYK activity is critical to the proliferative response” To: “consistent with the notion that cone precursor intrinsic SYK activity contributes to the proliferative response.”

      In the Results, we added a final sentence: 

      “However, given potential SYK inhibitor off-target effects, validation of the role of SYK in retinoblastoma initiation will require genetic ablation studies.”

      In the Discussion (2nd-to-last paragraph), we changed: 

      “SYK inhibition impaired pRB-depleted cone precursor cell cycle entry, implying that native SYK expression rather than de novo induction contributes to the cone precursors’ initial proliferation.” To: “…the pRB-depleted cone precursors’ sensitivity to a SYK inhibitor suggests that native SYK expression rather than de novo induction contributes to the cone precursors’ initial proliferation, although genetic ablation of SYK is needed to confirm this notion.” In the Discussion last sentence, we changed:

      “enabled the identification of developmental stage-specific cone precursor features that underlie retinoblastoma predisposition.” To: “enabled the identification of developmental stage-specific cone precursor features that are associated with the cone precursors’ predisposition to form retinoblastoma tumors.”

      Minor/Typos

      Figure 7 legend, H should be D.

      We corrected the figure legend (now related to Figure 8).

      Reviewer #2 (Recommendations for the authors):

      (1) The author should take advantage of recently published human fetal retina data, such as PMID:39117640, which includes a larger dataset of cells that could help validate the findings. Consequently, statements like "To our knowledge, this is the first indication of two immediately post-mitotic photoreceptor precursor populations with cone versus rod-biased gene expression" may need to be revised.

      We thank the reviewer for noting the evidence of distinct immediately post-mitotic rod and cone populations published by others after we submitted our manuscript. In response, we omitted the sentence mentioned and extensively cross-checked our results including:

      - comparison of our early versus late cone and rod maturation states to the cone and rod precursor versus cone and rod states identified by Zuo et al (new paragraph on the top half of p. 6 and new figure panels S3G,H);

      - detection of distinct immediately post-mitotic versus later cone and rod precursor populations (two new paragraphs on pp. 12-13 and new Figures S10B and S11A-E); 

      - identification of cone and rod precursor populations that co-express cone and rod marker genes (two new paragraphs starting at the bottom of p. 15 and new Figures S11D-F);

      - comparison of expression patterns of immature cone precursor (iCP) marker genes in our and the Zuo et al dataset (new paragraph on top half of p. 17 and new Figure S13).

      We also compare the cell states discerned in our study and the Zuo et al. study in a new Discussion paragraph (bottom of p. 23) and new Figure S17.

      (2) The data generated comes from dissociated cells, which inherently lack spatial context. Additionally, it is unclear whether the dataset represents a pool of retinas from multiple developmental stages, and if so, whether the developmental stage is known for each cell profiled. If this information is available, the authors should examine the distribution of developmental stages on the UMAP and trajectory analysis as part of the quality control process. 

      We thank the reviewer for highlighting the importance of spatial context and developmental stage. 

      Related to whether the dataset represents a pool of retinae from multiple developmental stages, the different cell numbers examined at each time point are indicated in Figure S1A. To draw the readers’ attention to this detail, Figure S1A is now cited in the first sentence of the Results. 

      Related to the age-related cell distributions in UMAP plots, the distribution of cells from each retina and age was (and is) shown in Fig. S1F. In addition, we now highlight the age distributions by segregating the FW13, FW15-17, and FW17-18-19 UMAP positions in the new Figure 1C. We describe the rod temporal changes in a new sentence at the top of  p. 5:

      “Few rods were detected at FW13, whereas both early and late rods were detected from FW15-19 (Figure 1C), corroborating prior reports [15,20].”  

      We describe the cone temporal changes and note the likely greater discrimination of cell state changes that would be afforded by separately analyzing macula versus peripheral retina at each age in a new sentence at the bottom of p. 5:

      “L/M cone precursors from different age retinae occupied different UMAP regions, suggesting age-related differences in L/M cone precursor maturation (Figure 1C).”

      Moreover, they should assess whether different developmental stages impact gene expression and isoform ratios. It is well established that cone and rod progenitors typically emerge at different developmental times and in distinct regions of the retina, with minimal physical overlap. Grouping progenitor cells based solely on their UMAP positioning may lead to an oversimplified interpretation of the data.

      (2a) We agree that different developmental stages may impact gene expression and isoform ratios, and evaluated stages primarily based on established Louvain clustering rather than UMAP position. However, we also used UMAP position to segregate so-called RPC-localized and nonRPC-localized iCPs and iRPs, as well as to characterize the bridge region iCP sub-populations. In the revision, we examine whether cell groups defined by UMAP positions helped to identify transcriptomically distinct populations and further examine the spatiotemporal gene expression patterns of the same genes in the Zuo et al. 3’ snRNA-seq dataset. 

      (2b) Related to analyses of immediately post-mitotic iRPs and iCPs, the new Figure S10A expanded the violin plots first shown in Figure 5D to compare gene expression in RPC-localized versus non-RPC-localized iCPs and iRPs and subsequent cone and rod precursor clusters (also presented in response to Reviewer 3). The new Figure S10C, shows a similar analysis of UMAP region-specific regulon activities. These figures support the idea that there are only subtle UMAP region-related differences in the expression of the selected gene and regulons. 

      To further evaluate early cone and rod precursors, we compared expression patterns in our cluster- and UMAP-defined cell groups to those of the spatiotemporally defined cell groups in the Zuo et al. 3’ snRNA-seq study. The results revealed similar expression timing of the genes examined, although the cluster assignments of a subset of cells were brought into question, especially the assigned rod precursors at pcw 10 and 13, as shown in new Figures S10B (grey columns) and S11, and as described in two new paragraphs starting near the bottom of p.12. 

      (2c) Related to analyses of iCPs in the so-called bridge region, our analyses of the Zuo et al dataset helped distinguish early cone and rod precursor populations (expressing early markers such as ATOH7 and CHRNA1) from the later stages exhibiting rod- and cone-related gene coexpression, which had intermixed in the UMAP bridge region in our dataset. Further parsing of early cone precursor marker spatiotemporal expression revealed intriguing differences as now described in the second half of a new paragraph at the top of p. 17, as follows:

      “Also, different iCP markers had different spatiotemporal expression: CHRNA1 and ATOH7 were most prominent in peripheral retina with ATOH7 strongest at pcw 10 and CHRNA1 strongest at pcw 13; CTC-378H22.2 was prominently expressed from pcw 10-13 in both the macula and the periphery; and DLL3 and ONECUT1 showed the earliest, strongest, and broadest expression (Figure S13B). The distinct patterns suggest spatiotemporally distinct roles for these factors in cone precursor differentiation.”

      (3) I would commend the authors for performing a validation experiment via RNA in situ to validate some of the findings. However, drawing conclusions from analyzing a small number of cells can still be dangerous. Furthermore, it is not entirely clear how the subclustering is done. Some cells change cell type identities in the high-resolution plot. For example, some iPRP cells from the low-resolution plots in Figure 1 are assigned as TR in high-resolution plots in Figure 5.

      The authors should provide justification on the identifies of RPC localized iPRP and TR.

      Comparison of their data with other publicly available data should strengthen their annotation

      We agree that drawing conclusions from scRNA-seq or in situ hybridization analysis of a small number of cells can be dangerous and have followed the reviewer’s suggestion to compare our data with other publicly available data, focusing on the 3’ snRNA-seq of Zuo et al. given its large size and extensive annotation. Our analysis of  the Zuo et al. dataset helped clarify cell identities by segregating cone and rod precursors with similar gene expression properties in distinct UMAP regions. However, we noted that the clustering of early cone and rod precursors likely gave numerous mis-assigned cells (as noted in response 2b above and shown in the new Figure S11). It would appear that insights may be derived from the combination of relatively shallow sequencing of a high number of cells and deep sequencing of substantially fewer cells. 

      Related to how subclustering was done, the Methods state, “A nearest-neighbors graph was constructed from the PCA embedding and clusters were identified using a Louvain algorithm at low and high resolutions (0.4 and 1.6)[70],” citing the Blondel et al reference for the Louvain clustering algorithm used in the Seurat package.  To clarify this, the results text was revised such that it now indicates the levels used to cluster at low resolution (0.4, p. 4, 2nd paragraph) and at high resolution (1.6, top of p. 11) .

      Related to the assignment of some iPRP cells from the low-resolution plots in Figure 1 to the TR cluster (now called the ‘iRP’ ‘cluster) in the high-resolution plots in Figure 5, we suggest that this is consistent with Louvain clustering, which does not follow a single dendrogram hierarchy. 

      The justification for referring to these groups as RPC-localized iCPs and iRPs relates to their biased gene and regulon expression in Fig. 5D and 5E, as stated on p. 12: 

      “In the RPC-localized region, iCPs had higher ONECUT1, THRB, and GNAT2, whereas iRPs trended towards higher NRL and NR2E3 (p= 0.19, p=0.054, respectively).”

      (4) Late-stage LM5 cluster Figure 9 is not defined anywhere in previous figures, in which LM clusters only range from 1 to 4. The inconsistency in cluster identification should be addressed.

      We revised the text related to this as follows: 

      “Indeed, our scRNA-seq analyses revealed that SYK RNA expression increased from the iCP stage through cluster LM4, in contrast to its minimal expression in rods (Figure 10E).  Moreover, SYK expression was abolished in the five-cell group with properties of late maturing cones (characterized in Figure 1E), here displayed separately from the other LM4 cells and designated LM5 (Figure 10E).”  (p. 19-20)

      (5) Syk inhibitor has been shown to be involved in RB cell survival in previous studies. The manuscript seems to abruptly make the connection between the single-cell data to RB in the last figure. The title and abstract should not distract from the bulk of the manuscript focusing on the rod and cone development, or the manuscript should make more connection to retinoblastoma.

      We appreciate the reviewer’s concern that the title may seem to over-emphasize the connection to retinoblastoma based solely on the SYK inhibitor studies. However, we suggest the title also emphasizes the identification and characterization of early human photoreceptor states, per se, and that there are a number of important connections beyond the SYK studies that could warrant the mention of cell-state-specific retinoblastoma-related features in the title.

      Most importantly, a prior concern with the cone cell-of-origin theory was that retinoblastoma cells express RNAs thought to mark retinal cell types other than cones, especially rods. The evidence presented here, that cone precursors also express the rod-related genes helps resolve this issue. The issue is noted numerous times in the manuscript, as follows:  

      In the Introduction, we write:

      “However, retinoblastoma cells also express rod lineage factor NRL RNAs, which – along with other evidence – suggested a heretofore unexplained connection between rod gene expression and retinoblastoma development[12,13]. Improved discrimination of early photoreceptor states is needed to determine if co-expression of rod- and cone-related genes is adopted during tumorigenesis or reflects the co-expression of such genes in the retinoblastoma cell of origin.” (bottom, p. 2) And: 

      “In this study, we sought to further define the transcriptomic underpinnings of human  photoreceptor development and their relationship to retinoblastoma tumorigenesis.” (last paragraph, p. 3)

      The Discussion also alluded to this issue and in the revised Discussion, we aimed to make the connection clearer.  We previously ended the 3rd-to-last paragraph with,  

      “iPRP [now iCP] and early LM cone precursors’ expression of NR2E3 and NRL RNAs suggest that their presence in retinoblastomas[12,13] reflects their normal expression in the L/M cone precursor cells of origin.” 

      We now separate and elaborate on this point in a new paragraph as follows: 

      “Our characterization of cone and rod-related RNA co-expression may help resolve questions about the retinoblastoma cell of origin. Past studies suggested that retinoblastoma cells co-express RNAs associated with rods, cones, or other retinal cells due to a loss of lineage fidelity[12]. However, the early L/M cone precursors’ expression of NR2E3 and NRL RNAs suggest that their presence in retinoblastomas[12,13] reflects their normal expression in the L/M cone precursor cells of origin. This idea is further supported by the retinoblastoma cells’ preferential expression of cone-enriched NRL transcript isoforms (Figure S5B).” (middle of p. 24) Based on the above, we elected to retain the title.  

      Minor comments:

      (1) It is difficult to see the orange and magenta colors in the Fig 3E RNA-FISH image. The colors should be changed, or the contrast threshold needs to be adjusted to make the puncta stand out more.

      We re-assigned colors, with red for FL-NRL puncta and green for Tr-NRL puncta. 

      (2) Figure 5C on page 8 should be corrected to Supplementary Figure 5C.

      We thank the reviewer for noting this error and changed the figure citation.

      Reviewer #3 (Recommendations for the authors):

      (1) Minor concerns

      a. Abbreviation of some words needs to be included, example: FW. 

      We now provide abbreviation definitions for FW and others throughout the manuscript.  

      b. Cat # does not matches with the 'key resource table' for many reagents/kits. Some examples are: CD133-PE mentioned on Page # 22 on # 71, SMART-Seq V4 Ultra Low Input RNA Kit and SMARTer Ultra Low RNA Kit for the Fluidigm C1 Sytem on Page # 22 on # 77, Nextera XT DNA Library preparation kit on Page # 23 on # 77.

      We thank the reviewer for noting these discrepancies. We have now checked all catalog numbers and made corrections as needed.

      c. Cat # and brand name of few reagents & kits is missing and not mentioned either in methods or in key resource table or both. Eg: FBS, Insulin, Glutamine, Penicillin, Streptomycin, HBSS, Quant-iT PicoGreen dsDNA assay, Nextera XT DNA LibraryPreparation Kit, 5' PCR Primer II A with CloneAmp HiFi PCR Premix. 

      Catalog numbers and brand names are now provided for the tissue culture and related reagents within the methods text and for kits in the Key Resources Table. Additional descriptions of the primers used for re-amplification and RACE were added to the Methods (p. 28-29).

      d. Spell and grammar check is needed throughout the manuscript is needed. Example. In Page # 46 RXRγlo is misspelled as RXRlo.

      Spelling and grammar checks were reviewed.

      (2) Methods & Key Resource table.

      a. In Page # 21, IRB# needs to be stated.      

      The IRB protocols have been added, now at top of p. 26.

      b. In Page # 21, Did the authors dissociate retinae in ice-cold phosphate-buffered saline or papain?   

      The relevant sentence was corrected to “dissected while submerged in ice-cold phosphatebuffered saline (PBS) and dissociated as described10.” ( p. 26)

      c. In Page # 21, How did the authors count or enumerate the cell count? Provide the details.

      We now state, “… a 10 µl volume was combined with 10 µl trypan blue and counted using a hemocytometer” (top of p. 27)

      d. Why did the authors choose to specifically use only 8 cells for cDNA preparation in Page # 22? State the reason and provide the details.

      The reasons for using 8 cells (to prevent evaporation and to manually transfer one slide-worth of droplets to one strip of PCR tubes) and additional single cell collection details are now provided as follows (new text underlined): 

      “Single cells were sorted on a BD FACSAria I at 4°C using 100 µm nozzle in single-cell mode into each of eight 1.2 µl lysis buffer droplets on parafilm-covered glass slides, with droplets positioned over pre-defined marks … .  Upon collection of eight cells per slide, droplets were transferred to individual low-retention PCR tubes (eight tubes per strip) (Bioplastics K69901, B57801) pre-cooled on ice to minimize evaporation. The process was repeated with a fresh piece of parafilm for up to 12 rounds to collect 96 cells). (p. 27, new text underlined)

      e. Key resource table does not include several resources used in this study. Example - NR2E3 antibody.

      We added the NR2E3 antibody and checked for other omissions.

      (3) Results & Figures & Figure Legends

      a. Regulon-defined RPC and photoreceptor precursor states

      i. On page # 4, 1 paragraph - Clarify the sentence 'Exclusion of all cells with <100,000 cells read and 18 cells.........Emsembl transcripts inferred'. Did the authors use 18 cells or 18FW retinae? 

      The sentence was changed to:

      “After sequencing, we excluded all cells with <100,000 read counts and 18 cells expressing one or more markers of retinal ganglion, amacrine, and/or horizontal cells (POU4F1, POU4F2, POU4F3, TFAP2A, TFAP2B, ISL1) and concurrently lacking photoreceptor lineage marker OTX2. This yielded 794 single cells with averages of 3,750,417 uniquely aligned reads, 8,278 genes detected, and 20,343 Ensembl transcripts inferred (Figure S1A-C).” (p. 4, new words underlined)

      To clarify that 18 retinae were used, the first sentence of the Results was revised as follows:

      “To interrogate transcriptomic changes during human photoreceptor development, dissociated RPCs and photoreceptor precursors were FACS-enriched from 18 retinae, ages FW13-19 …” (p. 4).

      Why did the authors 'exclude cells lacking photoreceptor lineage marker OTX2' from analysis especially when the purpose here was to choose photoreceptor precursor states & further results in the next paragraph clearly state that 5 clusters were comprised of cells with OTX2 and CRX expression. This is confusing.

      We apologize for the imprecise diction. We divided the evidently confusing sentence into two sentences to more clearly indicate that we removed cells that did not express OTX2, as in the first response to the previous question.

      ii. In Page # 5, the authors reported the number of cell populations (363 large and 5 distal) identified in the THRB+ L/M-cone cluster. What were the # of cell populations identified in the remaining 5 clusters of the UMAP space?

      We added the cell numbers in each group to Fig. 1B. We corrected the large LM group to 366 cells (p. 5) and note 371 LM cells , which includes the five distal cells, in Figure 1B.

      b. Differential expression of NRL and THRB isoforms in rod and cone precursors

      i. In Figure 3B, the authors compare and show the presence of 5 different NRL isoforms for all the 6 clusters that were defined in 3A. However, in the results, the ENST# of just 2 highly assigned transcript isoforms is given. What are the annotated names of the three other isoforms which are shown in 3B? Please explain in the Results.

      As requested, we now annotate the remaining isoforms as encoding full-length or truncated NRL in Fig. 3B and show isoform structures in new Supplementary Figure S4B.  We also refer to each transcript isoform in the Results (p. 7, last paragraph) and similarly evaluate all isoforms in RB31 cells (Fig. S5B).

      ii. What does the Mean FPM in the y-axis of Fig 3C refer to?

      Mean FPM represents mean read counts (fragments per million, FPM) for each position across Ensembl NRL exons for each cluster, as now stated in the 6th line of the Fig. 3 legend.

      iii. A clear explanation of the results for Figures 3E-3F is missing.

      We revised the text to more clearly describe the experiment as follows:

      “The cone cells’ higher proportional expression of Tr-NRL first exon sequences was validated by RNA fluorescence in situ hybridization (FISH) of FW16 fetal retina in which NRL immunofluorescence was used to identify rod precursors, RXRg immunofluorescence was used to identify cone precursors, and FISH probes specific to truncated Tr-NRL exon 1T or FL-NRL exons 1 and 2 were used to assess Tr-NRL and FL-NRL expression (Figure 3E,F).” (p. 8, new text underlined).

      c. Two post-mitotic photoreceptor precursor populations

      i. Although deep-sequencing and SCENIC analysis clarified the identities of four RPC-localized clusters as MG, RPC, and iPRP indicative of cone-bias and TR indicative of rod-bias. It would be interesting to see the discriminating determinant between the TR and ER by SCENIC and deep-sequencing gene expression violin/box plots.

      We agree it is of interest to see the discriminating determinant between the TR [now termed iRP] and ER clusters by SCENIC and deep-sequencing gene expression violin/box plots. We now provide this information for selected genes and regulons of interest in the new Supplementary Figures S10A and S10C, along with a similar comparison between the prior high-resolution iPRP (now termed iCP) cluster and the first high-resolution LM cluster, LM1, as described for gene expression on p. 12:

      “Notably, THRB and GNAT2 expression did not significantly change while ONECUT1 declined in the subsequent non-RPC-localized iCP and LM1 stages, whereas NR2E3 and NRL dramatically increased on transitioning to the ER state (Figure S10A).”

      And as described for regulon activities on pp. 13-14:

      “Finally, activities of the cone-specific THRB and ISL2 regulons, the rod-specific NRL regulon, and the pan-photoreceptor LHX3, OTX2, CRX, and NEUROD1 regulons increased to varying extents on transitioning from the immature iCP or iRP states to the early-maturing LM1 or ER states (Figure 10C).”

      We also show expression of the same genes for spatiotemporally grouped cells from the Zuo et al. dataset in the new Figure S10B, which displays a similar pattern (apart from the possibly mixed pcw 10 and pcw13 designated rod precursors).

      d. Early cone precursors with cone- and rod-related RNA expression

      i. On page #12, the last paragraph where the authors explain the multiplex RNA FISH results of RXRγ and NR2E3 by citing Figure S8E. However, in Fig S8E, the authors used NRL to identify the rods. Please clarify which one of the rod markers was used to perform RNA FISH?

      Figure S8E (where NRL was used as a rod marker) was cited to remind readers that RXRg has low expression in rods and high expression in cones, rather than to describe the results of this multiplex FISH section. To avoid confusion on this point, Figure S8E is now cited using “(as earlier shown in Figure S8E).” With this issue clarified, we expect the markers used in the FISH + IF analysis will be clear from the revised explanation, 

      “… we examined GNAT2 and NR2E3 RNA co-expression in RXRg+ cone precursors in the outermost NBL and in RXRg+ rod precursors in the middle NBL … .” (p. 14-15).

      To provide further clarity, we provide a diagram of the FISH probes, protein markers, and expression patterns in the new Figure 7E.

      ii. The Y-axis of Fig 6G-6H needs to be labelled.

      The axes have been re-labeled from “Nb of cells” to “Number of RXRg+ outermost NBL cells in each region” (original Fig. 6G, now Fig. 7C) and “Number of RXRg+ middle NBL cells in each region” (original Fig. 6H, now Fig. 7D).

      iii. The legends of Figures 6G and 6H are unclear. In the Figure 6G legend, the authors indicate 'all cells are NR2E3 protein-'. Does that imply the yellow and green bars alone? Similarly, clarify the Figure 6H legend, what does the dark and light magenta refer to? What does the light magenta color referring to NR2E3+/ NR2E3- and the dark magenta color referring to NR2E3+/ NR2E3+ indicate? 

      We regret the insufficient clarity. We revised the Fig. 6G (now Fig. 7C) key, which now reads

      “All outermost NBL cells are NR2E3 protein-negative.”  We added to the figure legend for panel 7C,D “(n.b., italics are used for RNAs, non-italics for proteins).”  The new scheme in Figure 7E shows the RNAs in italics proteins in non-italics. We hope these changes will clarify when RNA or protein are represented in each histogram category.

      Overall, the results (on page # 13) reflecting Figures 6E-6H & Figure S11 are confusing and difficult to understand. Clear descriptions and explanations are needed.

      We revised this results section described in the paragraph now spanning p. 14:

      -  We now refer to the bar colors in Figures 7C and 7D that support each statement. 

      -  We provide an illustration of the findings in Figure 7E.

      iv. Previously published literature has shown that cells of the inner NBL are RXRγ+ ganglion cells. So, how were these RXRγ+ ganglion cells in the inner NBL discriminated during multiplex RNA FISH (in Fig 6E-6H and in Fig S11)?

      We thank the reviewer for requesting this clarification. We agree that “inner NBL” is the incorrect term for the region in which we examined RXRg+ photoreceptor precursors, as this could include RXRγ+ nascent RGCs. We now clarify that 

      “we examined GNAT2 and NR2E3 RNA co-expression in RXRg+ cone precursors in the outermost NBL and in RXRg+ rod precursors in the middle NBL … .”  (p. 14-15) We further state, 

      “Limiting our analysis to the outer and middle NBL allowed us to disregard RXRγ+ retinal ganglion cells in the retinal ganglion cell layer or inner NBL (top of p. 15)”

      Figure 7E is provided to further aid the reader in understanding the positions examined, and the legend states “RXRg+ retinal ganglion cells in the inner NBL and ganglion cell layer not shown. 

      v. In Figure 6E, what marker does each color cell correspond to?

      In this figure (now panel 7A), we declined to provide the color key since the image is not sufficiently enlarged to visualize the IF and FISH signals. The figure is provided solely to document the regions analyzed and readers are now referred to “see Figure S12 for IF + FISH images” (2nd line, p. 15), where the marker colors are indicated.

      vi. In Figure S11 & 6E, Protein and RNA transcript color of NR2E3, GNAT2 are hard to distinguish. Usage of other colors is recommended.  

      We appreciate the reviewer’s concern related to the colors (in the now redesignated Figure S12 and 7A); however, we feel this issue is largely mitigated by our use of arrows to point to the cells needed to illustrate the proposed concepts in Figure S12B. All quantitation was performed by examining each color channel separately to ensure correct attribution, which is now mentioned in the Methods (2nd-to-last line of Quantitation of FISH section, p. 35).

      vii. 

      With due respect, we suggest that labeling each box (now in Figure 8B) makes the figure rather busy and difficult to infer the main point, which is that boxed regions were examined at various distanced from the center (denoted by the “C” and “0 mm”) with distances periodically indicated. We suggest the addition of such markers would not improve and might worsen the figure for most readers.    

      e. An early L/M cone trajectory marked by successive lncRNA expression

      i. In Figure 8C - color-coded labelling of LM1-4 clusters is recommended.

      We note Fig. 8C (now 9C) is intended to use color to display the pseudotemporal positions of each cell. We recognize that an additional plot with the pseudotime line imposed on LM subcluster colors could provide some insights, yet we are unaware of available software for this and are unable to develop such software at present. To enable readers to obtain a visual impression of the pseudotime vs subcluster positions, we now refer the reader to Figure 5A in the revised figure legend, as follows:  (“The pseudotime trajectory may be related to LM1-LM4 subcluster distributions in Figure 5A.”).

      ii. In Figure 8G - what does the horizontal color-coded bar below the lncRNAs name refer to? These bars are similar in all four graphs of the 8G figure.

      As stated in the Fig. 8G (now 9G) legend, “Colored bars mark lncRNA expression regions as described in the text.”  We revised the text to more clearly identify the color code. (p. 18-19)   

      f. Cone intrinsic SYK contributions to the proliferative response to pRB loss

      i. In Fig 9F - The expression of ARR3+ cells (indicated by the green arrow in FW18) is poorly or rarely seen in the peripheral retina.

      We thank the reviewer for finding this oversight. In panel 9F (now 10F), we removed the green arrows from the cells in the periphery, which are ARR3- due to the immaturity of cones in this region. 

      ii. In Figure 9F - Did the authors stain the FW16 retina with ARR3?

      Unfortunately, we did not stain the FW16 retina for ARR3 in this instance.

      iii. Inclusion of DAPI staining for Fig 9F is recommended to justify the ONL & INL in the images.

      We regret that we are unable to merge the DAPI in this instance due to the way in which the original staining was imaged.  A more detailed analysis corroborating and extending the current results is in progress. 

      iv. Immunostaining images for Figure 9G are missing & are required to be included. What does shSCR in Fig 9G refer to?

      We now provide representative immunostaining images below the panel (now 10G). The legend was updated: “Bottom: Example of Ki67, YFP, and RXRg co-immunostaining with DAPI+ nuclei (yellow outlines). Arrows: Ki67+, YFP+, RXRg+ nuclei.”  The revised legend now notes that shSCR refers to the scrambled control shRNA.

      v. For Figure 9H - Is the presence and loss of SYK activity consistent with all the subpopulations (S & LM) of early maturing and matured cones?

      We appreciate the reviewer’s question and interest (relating to the redesignated Figure 10H); however, we have not yet completed a comprehensive evaluation of SYK expression in all the subpopulations (S & LM) of early maturing and matured cones and will reserve such data for a subsequent study. We suggest that this information is not critical to the study’s major conclusions.

      vi. Figure 9A is not explained in the results. Why were MYCN proteins assessed along with ARR3 and NRL? What does this imply?

      We thank the reviewer for noting that this figure (now Figure 10A) was not clearly described. 

      As per the response to Reviewer 1, point 6 , the text now states,  

      “The upregulation of MYC target genes was of interest given that many MYC target genes are also MYCN targets, that MYCN protein is highly expressed in maturing (ARR3+) cone precursors but not in NRL+ rods (Figure 10A), and that MYCN is critical to the cone precursor proliferative response to pRB loss [8–10].” (middle, p. 19, new text underlined).

      Hence, the figure demonstrates the cone cell specificity of high MYCN protein.  This is further noted in the Fig. 10a legend: “A. Immunofluorescent staining shows high MYCN in ARR3+ cones but not in NRL+ rods in FW18 retina.”

    1. Parcoursup 2025 : aborder sereinement la phase d’admission dès le 2 juin

      FCPE Nationale

      La phase d’admission principale de Parcoursup débutera à partir du lundi 2 juin 2025. Les candidats pourront consulter au fur et à mesure les réponses des formations et ils devront répondre aux propositions dans les délais indiqués par la plateforme.

      Voici le replay du webinaire organisé le 26 mai dernier avec Jérôme Teillard, chargé de mission Parcoursup au ministère chargé de l’Enseignement supérieur et de la Recherche, et une FAQ pour les futurs étudiants et leurs parents

      compte-rendu détaillé d'une réunion sur le processus d'admission post-bac, vraisemblablement via la plateforme Parcoursup.

      Les thèmes principaux abordés concernent le fonctionnement de la phase d'admission, les critères de sélection, les différents types de réponses possibles, la gestion des listes d'attente, l'importance de la réactivité des candidats, les dispositifs d'accompagnement et d'aide financière, ainsi que des conseils pratiques pour les lycéens et leurs parents.

      Thèmes Principaux et Faits Importants :

      1. Le Processus de Sélection et l'Algorithme Parcoursup :

      Il est clairement affirmé que Parcoursup n'examine pas les dossiers des lycéens et ne choisit pas leurs affectations. Ce sont les commissions d'examen des vœux (CEV) de chaque formation, qu'elles soient publiques ou privées, qui définissent les critères, évaluent les candidatures, et établissent des classements. "Ça n'est jamais l'algorithme de parcours. Chute qui examine les dossiers des lycéens. Ça n'est pas non plus parcouru qui choisit leur affectation. Ce sont bien au sein de chaque formation, qu'elle soit publique ou privée, une commission d'examen des vœux qui a défini des critères qui les a appliqués, qui a évalué les candidatures." (00:04:01.230 - 00:04:20.770) Ces commissions pondèrent des éléments de résultats scolaires et des éléments plus qualitatifs. Plus de 120 000 classements sont ensuite remontés à Parcoursup, qui intègre alors les priorités légales.

      2. Priorités Légales et Dispositifs d'Aide :

      Des priorités sont accordées à certains profils de candidats pour garantir l'équité et soutenir la mobilité ou l'intégration :

      Lycéens boursiers : "priorité pour des lycées boursiers. Puisque je rappelle que dechu pour chaque formation, y compris les formations les plus sélectives, il y a des taux minimum de lycées boursiers" (00:05:09.130 - 00:05:20.290). Une aide financière de 500 euros est proposée aux lycéens boursiers qui s'inscrivent dans une formation hors de leur académie de résidence pour accompagner la mobilité.

      Participation aux cordées de la réussite : Prise en compte par environ 40% des formations en 2025. Lycéens professionnels : Places priorisées pour accéder aux BTS.

      Lycéens technologiques : Places priorisées pour les Bachelors Universitaires de Technologie (BUT) car "ce sont les formations dans lesquelles ils réussissent le mieux." (00:06:08.830 - 00:06:11.780)

      Accès aux licences en tension : Priorité pour les candidats du secteur ou de l'académie, avec des exceptions géographiques (ex: Île-de-France). Lycéens français à l'étranger : Prioritaires dans toutes les universités en France métropolitaine ou outre-mer.

      3. La Phase d'Admission (à partir du 2 juin) :

      Types de réponses pour formations sélectives (BTS, BUT, CPGE, IFSI, etc.) :"Oui" : Proposition d'admission. Le candidat a un délai pour accepter ou renoncer.

      "Oui si" : Proposition d'admission sous condition de suivre un parcours d'accompagnement ou de remise à niveau (ex: modules disciplinaires). Ce n'est pas un "non", mais une "réponse avec alerte pour signaler qu'il y a un besoin de renfort". (00:21:48.610 - 00:21:57.520) L'an dernier, 26 000 étudiants ont été inscrits avec un "oui si".

      "Non" : Réponse négative. Le candidat peut consulter une notification pour demander des explications sur les critères et motifs de la décision. "Chaque lycée a un mois à compter de la publication... pour poser les questions" (00:40:29.190 - 00:40:34.390).

      Types de réponses pour les licences non sélectives :"Oui" : Admission.

      "Oui si" : Admission conditionnelle (voir ci-dessus).

      "En attente" : Le candidat est sur liste d'attente. L'évolution de cette liste dépend des réponses des candidats mieux classés.

      Réponses pour les vœux en apprentissage : La réponse est "candidature retenue sous réserve de contrats." (00:41:24.870 - 00:41:28.410) Le candidat doit trouver un contrat d'apprentissage pour que l'admission soit effective. Le calendrier est plus long, jusqu'en septembre. Il est conseillé d'accepter une proposition "étudiant" classique en attendant de trouver un contrat d'apprentissage.

      4. Gestion des Réponses et des Vœux en Attente :

      Fréquence des mises à jour : "Il n'y a qu'une mise à jour le matin. Donc ça ne sert à rien de se connecter 50 fois dans la journée." (00:23:28.800 - 00:23:31.600)

      Délais de réponse : Le délai de réponse initial est court (ex: jusqu'au 5 juin 23h59 pour les premiers jours). Ensuite, un rythme plus "pérenne" se met en place.

      Importance de répondre : "même quand on n'a pas envie d'une formation, il ne faut pas snoer par cour sup, il faut répondre. Vous avez le droit de dire oui ou de dire non, ce que je vous demande, c'est de répondre" (00:27:53.870 - 00:28:03.360).

      Hiérarchisation des vœux en attente : Depuis 2022, les candidats ayant des vœux en attente sont invités à les classer par ordre de préférence.

      Ce classement est personnel et confidentiel pour les formations.

      "La seule question à vous poser, c'est si j'avais une proposition d'admission sur ce vœu là, est ce que je l'accepterai et est ce que je renoncerai à la proposition que j'ai déjà acceptée?" (01:05:27.640 - 01:05:37.760)

      Conséquences du classement : L'acceptation d'un vœu plus haut dans la hiérarchie entraîne l'abandon des vœux inférieurs et la suppression des vœux en attente moins bien classés.

      "Si soit le v et pas le vœ deux, ça supprime le deux." (01:29:23.480 - 01:29:24.590) "l'objectif, ce n'est pas de faire piéger, mais c'est de vraiment que quelqu'un qui avait encore 15 jours en attente, s'il a la proposition qui satisfait i rend beaucoup ça évite que les listes d'attente, elles soient bloquées par certains qui attendent et qui voudraient les garder indéfiniment." (01:30:37.900 - 01:30:49.420)

      Possibilité de ne pas avoir de proposition : "Oui, on peut avoir des candidats... un tiers se trouve en attente sur ses vœux." (00:44:41.430 - 00:44:51.870) Cependant, en moyenne, les lycéens reçoivent près de six propositions sur les 33 jours du processus.

      La phase complémentaire : Une solution pour ceux qui n'ont pas de proposition ou qui souhaitent changer d'orientation. Elle permet de formuler de nouveaux vœux pour des formations ayant encore des places disponibles. Il est impossible de refaire un vœu pour une formation où l'on a déjà été refusé, car le jury s'est déjà prononcé.

      5. Outils et Accompagnement :

      Site d'entraînement Parcoursup : Des ressources (vidéos, règles d'or, quiz, situations fictives) sont disponibles pour aider les candidats à comprendre le fonctionnement et les règles de la phase d'admission. "il n'y a pas de boîte noire." (00:14:10.490 - 00:14:13.560)

      Numéro vert et réseaux sociaux : Assistance disponible. Mise à jour des coordonnées : S'assurer que les informations de contact (numéro de téléphone, adresse e-mail) sont correctes pour recevoir les notifications. "Le risque de ne pas avoir de proposition vient de la déconnexion avec son dossier." (00:46:00.670 - 00:46:03.950)

      6. Cas Particuliers :

      Refus du Bac au rattrapage : Si un candidat n'obtient pas son baccalauréat, la proposition d'admission est annulée.

      Informations sur le lycée d'origine : Les formations ont accès à l'établissement d'origine du candidat. Cependant, un critère géographique discriminant est interdit par la loi.

      Césure : Possibilité de demander une année de césure. Le candidat doit d'abord accepter une formation, puis l'établissement décidera d'accepter ou non le projet de césure, en fonction de sa "crédibilité" et de sa "cohérence". (01:35:54.560 - 01:35:59.700) La césure est différente d'une année sabbatique, car elle implique une inscription dans un établissement et potentiellement un statut social.

      En résumé, la réunion vise à démythifier le processus Parcoursup, en soulignant l'importance de l'évaluation humaine des dossiers par les formations et en détaillant les étapes claires et les règles de la phase d'admission. L'accent est mis sur la réactivité des candidats et la compréhension des différentes options et aides disponibles pour maximiser leurs chances d'intégrer la formation de leur choix.

    1. Résumé de la vidéo [00:00:01][^1^][1] - [00:23:45][^2^][2]:

      Cette vidéo présente un webinaire sur la commission d'appel et de recours dans l'enseignement primaire et secondaire en France. Elle explique le processus d'appel des décisions de passage ou de redoublement, les principes clés, et les droits des parents dans le système éducatif français.

      Points forts: + [00:00:13][^3^][3] Introduction au webinaire * Présentation du sujet et de l'importance des commissions d'appel et de recours * Discussion sur le rôle des conseils de classe et des conseils des maîtres + [00:01:00][^4^][4] La commission d'appel dans le primaire * Explication du fonctionnement et des raisons de faire appel * Détails sur les décisions de passage et les conditions de redoublement + [00:03:10][^5^][5] Le droit d'appel et le PPRE * Importance du droit d'appel comme principe de droit administratif * Mise en place obligatoire d'un Programme Personnalisé de Réussite Éducative (PPRE) en cas de redoublement + [00:07:01][^6^][6] Procédure d'appel et rôle des parents * Processus à suivre par les parents pour contester une décision * Conseils sur la préparation et la présentation de l'appel devant la commission + [00:10:46][^7^][7] Composition de la commission d'appel * Description des membres de la commission et de leur rôle impartial * Importance de la neutralité et de l'objectivité dans le processus d'appel + [00:14:01][^8^][8] Rôle des représentants des parents * Soutien et accompagnement des familles par les représentants des parents * Préparation des parents pour la présentation de leur cas devant la commission Résumé de la vidéo [00:23:46][^1^][1] - [00:46:07][^2^][2]:

      Cette partie du webinaire PEEP aborde la commission d'appel et de recours dans l'enseignement primaire et secondaire. Elle explique le rôle des membres du jury, les procédures de traitement des dossiers, et l'importance de prendre des décisions équitables et justifiées pour l'intérêt des élèves.

      Points forts: + [00:23:46][^3^][3] Fonctionnement de la commission * Détails sur la durée des sessions et l'importance de la préparation * Explication des rôles des membres et des accompagnateurs des familles + [00:27:45][^4^][4] Examen des dossiers * Importance de la complétude des dossiers pour les décisions * Procédure en cas de vice de forme et conséquences pour les familles + [00:31:26][^5^][5] Prise de décision * Critères pour juger les appels et l'importance de motivations objectives * Processus de vote et de notification des décisions aux familles + [00:37:03][^6^][6] Commissions d'appel dans le secondaire * Différences avec le primaire et importance des paliers d'orientation * Rôle des documents médicaux et sociaux dans les décisions d'appel Résumé de la vidéo [00:46:08][^1^][1] - [01:08:00][^2^][2]:

      Cette vidéo est un webinaire de la PEEP qui traite de la commission d'appel et de recours dans l'enseignement primaire et secondaire en France. Elle explique le processus d'appel des décisions du conseil de classe concernant l'orientation des élèves, en particulier lorsqu'il y a un désaccord sur le choix des spécialités ou des filières.

      Points forts: + [00:46:08][^3^][3] Le rôle du conseil de classe * Ne doit pas statuer sur les spécialités ou filières choisies par l'élève * Les erreurs du conseil de classe peuvent être corrigées en faveur de la famille + [00:50:40][^4^][4] Le processus d'appel * Les familles ont un délai pour faire appel des décisions * La commission d'appel examine les dossiers et entend les arguments des familles + [00:57:03][^5^][5] La composition de la commission d'appel * Comprend divers membres dont des représentants des parents d'élèves * Les absences de certains membres peuvent influencer la décision finale + [01:07:01][^6^][6] L'importance de l'orientation * Discuter des projets d'orientation avec les élèves et les familles * Les commissions d'appel doivent prendre en compte les erreurs du conseil de classe et les souhaits des familles Résumé de la vidéo [01:08:02][^1^][1] - [01:29:26][^2^][2] :

      Cette partie du webinaire aborde la commission d'appel et de recours dans l'éducation primaire et secondaire, en mettant l'accent sur le rôle des parents et les procédures à suivre.

      Points forts : + [01:08:02][^3^][3] Le rôle des parents dans la commission * Importance de l'expression et de la défense des intérêts de l'élève * Éviter de défendre l'indéfendable, se concentrer sur la réussite de l'élève + [01:10:09][^4^][4] Confidentialité et préparation * Obligation de confidentialité sur les délibérations et les votes * Préparation des parents pour leur intervention lors de la commission + [01:11:12][^5^][5] Cas particuliers des élèves de terminale * Droit de réinscription dans l'établissement d'origine après échec au baccalauréat * Possibilité de changement d'établissement pour une nouvelle dynamique + [01:14:11][^6^][6] Conseils pour les parents * Préparer un argumentaire solide et éviter les promesses irréalistes * Importance de la ponctualité et de la prise en compte de la fatigue lors des délibérations Résumé de la vidéo [01:29:27][^1^][1] - [01:30:54][^2^][2]:

      Cette partie du webinaire aborde la commission d'appel et de recours dans l'éducation primaire et secondaire. Elle explique le processus d'audition, le rôle du président en cas d'égalité des votes et la disponibilité des ressources du webinaire.

      Points forts: + [01:29:27][^3^][3] Processus d'audition * Capacité à écouter et lire simultanément * Importance de suivre l'audition tout en lisant les documents + [01:30:02][^4^][4] Rôle du président * En cas d'égalité des votes, la voix du président compte double * Règlement intérieur similaire à d'autres statuts + [01:30:21][^5^][5] Disponibilité des ressources * Les diaporamas sont disponibles sur le centre de ressources * Les webinaires sont accessibles sur la chaîne YouTube de la Fédération

    1. Note de synthèse : Les formes de la violence - Une analyse multifacette

      Ce document de synthèse explore la nature complexe de la violence, en s'appuyant sur les thèses de Didier Fassin et les exemples historiques et philosophiques cités dans les sources.

      Il met en lumière la dualité morale et politique de la violence, les diverses éthiques du refus de la violence, et la relecture contemporaine des rapports de force et de légitimité.

      1. La Violence : Jugement Moral et Réalité Politique Ambiguë

      La violence est intrinsèquement liée à un jugement moral de réprobation quasi-universel, qu'elle soit domestique, coloniale, ou étatique. Cependant, sa réalité politique est ambiguë :

      • Le Déni Concérté : La violence fait l'objet d'une condamnation générale, ce qui entraîne "le déni concerner la société toute entière ou des groupes particuliers".

      • Double Standard du Pouvoir : Le pouvoir prétend la prévenir et la combattre, mais "il s'en détournent en protègent les auteurs lorsqu'ils sont puissants mais les accablent quand ils appartiennent aux classes populaires aux minorités racisées aux populations étrangères".

      Cette contradiction entre le discours moral et la pratique politique est fondamentale.

      2. Les Éthiques du Refus de la Violence : Diversité et Complexité

      Le refus de la violence s'inscrit dans des traditions religieuses et philosophiques variées, mais sa signification est polysémique et complexe :

      • Traditions Religieuses : Le "Sermon sur la montagne" (Évangile de Matthieu) en est une expression canonique chrétienne :

      "Vous avez appris qu'il a été dit œil pour œil et dents pour dents et moi je vous dis de ne pas résister au méchant au contraire si quelqu'un te gifle sur la joue droite tends-lui l'autre joue."

      C'est l'éthos des premiers martyrs.

      • Traditions Philosophiques : Éric Weil (1967) vise "d'éliminer la violence", considérant cela comme "le secret de la philosophie".

      • Polysémie du Refus : Refuser la violence signifie-t-il la refuser de la commettre, d'y contribuer, de s'y soumettre, de la voir, de l'exhiber, d'en parler ? Ces questions ont des significations morales et politiques distinctes.

      • Légitimité de la Violence Contre l'Oppression : La question se pose de savoir si une réponse violente à l'oppression peut être légitime, et sous quelle forme, tant pour les mouvements de résistance que pour les situations individuelles de mauvais traitement.

      3. La Non-Violence : Manifestes et Figures Pionnières

      Plusieurs épisodes historiques illustrent le développement de la non-violence moderne :

      • "The Mask of Anarchy" de Percy Bysshe Shelley (1819) : Ce poème est considéré comme le premier manifeste moderne de la non-violence, écrit en réaction au massacre de Peterloo.

      Il exhorte la foule : "levez-vous comme des lions après le repos en nombre invincible secouez vos chaînes comme la rosée tombé sur vous pendant votre sommeil vous êtes nombreux ils sont peu".

      • Henry David Thoreau et la Désobéissance Civile : Son essai "Resistance to Civil Government" (1849) prône le refus de payer l'impôt fédéral pour protester contre l'esclavage et la guerre américano-mexicaine.

      Il condamne la violence structurelle de l'esclavage et la violence coloniale de la conquête.

      Thoreau suggère que "Sous un gouvernement qui emprisonne quiconque injustement la véritable place d'un homme juste est aussi en prison."

      Pour lui, le refus de l'impôt par un millier de citoyens ne serait "pas un acte violent et sanglant comme le serait de les payer et ainsi de permettre à l'État de commettre des violences et de répandre un sang innocent", définissant cela comme "une révolution pacifique".

      • Mahatma Gandhi et le Satyagraha : Inspiré par Thoreau, Gandhi a mobilisé des milliers d'Indiens et de Chinois en Afrique du Sud contre le "Black Act".

      Le Satyagraha, "attachement à la vérité" ou "force de la vérité", est une "arme d'effort qui exclut tout recours à la violence et cherche à accéder à la vérité".

      Pour Gandhi, "renverser l'oppresseur n'est qu'une demi-victoire le convaincre de se transformer est une victoire pleine et entière".

      La Marche du Sel (1930) en Inde en est un exemple emblématique.

      • Martin Luther King Jr. et le Mouvement des Droits Civiques : Fortement influencé par Thoreau, King a dirigé des campagnes non-violentes, notamment le boycott des bus de Montgomery et les manifestations de Birmingham.

      Il souligne l'importance de "réunir les preuves de l'existence de l'injustice", "négocier des améliorations", "développer un programme dit de purification basé sur l'apprentissage des pratiques de la nonviolence" et "passer à l'action pour provoquer une crise censé déboucher sur l'ouverture de discussion".

      King n'était pas contre l'État ni la Constitution, mais s'appuyait sur eux pour finir avec la ségrégation.

      4. La Violence Légitime de l'Opprimé : Une Perspective Contestée

      La question de la légitimité de la violence de l'opprimé est un point de divergence crucial :

      • Thoreau et John Brown : Bien qu'apôtre de la non-violence, Thoreau a défendu John Brown, un abolitionniste ayant utilisé des moyens violents.

      Thoreau adopte la position du "spectateur révolutionnaire non violent" qui "cherche à réduire la violence de la vie non seulement en refusant de commettre des actes violents mais également en formant une communauté de spectateurs non violents qui témoignent de l'usage de la violence pour en terminer avec l'oppression et l'extermination et l'exploitation".

      Il admet des circonstances où la violence serait "inévitable" si d'autres options sont inefficaces, en fonction de la gravité de la cause, de l'imminence du danger et de la proportionnalité.

      • Frederick Douglass : Cet ancien esclave a défendu le recours à la violence pour la liberté, arguant que "tous les êtres humains ont des droits fondamentaux à la vie et à la liberté la privation de la seconde la liberté et le risque de privation de la première la vie justifiant alors le recours à la violence".

      Pour lui, les propriétaires d'esclaves perdent leurs droits fondamentaux en privant autrui des siens.

      • Frantz Fanon et la Violence Décolonisatrice :

      Dans "Les Damnés de la Terre" (1961), Fanon défend la violence comme "indispensable au processus de décolonisation".

      Il décrit la rencontre entre colonisateur et colonisé "toujours déroulé sous le signe de la violence".

      Pour Fanon, la violence coloniale est non seulement physique mais aussi morale, "déshumanise et même à proprement parler l'animalise".

      La violence de la décolonisation est "doublement libératrice" : elle mène à l'indépendance et "débarrasse le colonisé de son complexe d'infériorité [...] elle le rend intrépide le réhabilite à ses propres yeux."

      Le texte de Fanon est moins un plaidoyer pour la violence qu'une "description lyrique et une et une explication subjective des faits qui conduisent à la décolonisation de la violence il révèle le caractère inéluctable nécessaire et finalement justifié".

      • Jean-Paul Sartre et la Radicalisation : La préface de Sartre aux "Damnés de la Terre" a radicalisé le propos de Fanon, par exemple avec la formule : "abattre un européen c'est faire d'une pierre de coup supprimer en même temps un oppresseur et un opprimé reste un homme mort et un homme libre".

      Cette interprétation, selon Alice Cherky, transforme l'analyse de Fanon de l'inéluctabilité de la violence en une "enthousiaste justification" du meurtre.

      • Jean Améry et le "Tabou de la Vengeance" : Survivant de la Shoah, Améry établit un parallèle entre la situation du colonisé et celle du détenu juif dans un camp, affirmant que "la liberté et la dignité doivent être acquises par la violence pour être liberté et dignité".

      Il défend le "tabou de la vengeance", arguant que "l'oppresseur ayant subi la violence que lui inflige l'opprimé devient son frère partageant son humanité".

      5. L'Inversion des Valeurs : La Violence du Colonisé comme Terrorisme

      Le texte souligne un "considérable déplacement moral et politique" et une "inversion des valeurs" au cours du dernier demi-siècle :

      • Légitimation Historique vs. Disqualification Actuelle : Kant, Thoreau, Douglass, Fanon, Sartre, Améry ont considéré la violence de l'opprimé comme légitime et une affirmation de l'humanité.

      Aujourd'hui, "la lutte du colonisé est désormais disqualifiée et sa résistance appelée terrorisme tandis que la politique du colonisateur est accepté et sa brutalité justifiée".

      • Redéfinition du Terrorisme : Historiquement, la "terreur" qualifiait un phénomène d'État (la Révolution française, régimes totalitaires).

      Progressivement, le terme "terrorisme" s'est différencié pour désigner "des pratiques et des groupes non étatiques et même s'opposant à l'État par des actions violentes".

      • Terrorisme : Arme du Faible ou Attribut de Délégitimation ? : Selon l'historien Henry Laurens, le terrorisme non étatique tue moins que la terreur d'État et sert à communiquer sur une situation d'oppression.

      Cependant, "en tant qu'attribut il sert aujourd'hui à délégitimer la lutte du faible et à éluder la violence du fort".

      Des figures comme Nelson Mandela, Menahem Begin, et Yasser Arafat ont été qualifiés de terroristes avant de devenir des chefs d'État et des Prix Nobel de la paix, illustrant la labilité de cette qualification.

      6. Refuser la Violence : Au-delà des Modèles Dualistes

      Le cours explore des formes plus subtiles de refus de la violence et remet en question les distinctions binaires :

      • Antigone et Bartleby : Ces figures littéraires représentent deux modalités de refus : la protestation explicite et spectaculaire (Antigone) et la résistance silencieuse et discrète (Bartleby).

      • Objecteurs de Conscience en Israël : Erica Weiss distingue la "résistance" publique (réfusnics) et l'"abstention" majoritaire mais invisible face au service militaire.

      L'abstention bénéficie d'une meilleure tolérance étatique, tandis que la résistance publique expose à des sanctions et à la stigmatisation.

      • Tactiques de Refus des Victimes : Pour les Palestiniens sous les bombes, refuser la violence n'est pas l'empêcher, mais "trouver le moyen de distraire ses enfants quand le vrombissement des avions et la détonation des explosions se font entendre", "s'attacher à des détails de la vie quotidienne en conservant la distance d'un humour tendre", ou "la nommer en décrire les effets sur les corps en exposer les conséquences sur les vies".

      • Violence Structurelle et Complexité des Rapports de Force : Honour Gun Gunai montre qu'en Turquie, c'est l'Autre (Arméniens, Grecs, Kurdes) qui est accusé de violence, jamais l'État. Alpana Roy (en Inde) et sa recherche sur les Naxalites soulignent l'importance de ne pas "se focaliser sur les opérations de guerri et d'oublier la violence structurelle".

      La division des opprimés par les oppresseurs ("diviser pour régner") est une stratégie courante.

      La Politique du Refus et la Violence Épistémique :

      • Audre Lorde : "les outils du maître ne démoliront jamais la maison du maître". Il faut refuser le cadre épistémologique du dominant.

      • Audra Simpson ("Mohawk Interruptus") : En refusant le passeport canadien, les Mohawks rejettent la manière dont l'État canadien les a traités et l'illégitimité de la législation coloniale. Elle plaide pour une "double souveraineté".

      • Gayatri Chakravorty Spivak ("Can the Subaltern Speak?") : La violence épistémique consiste à s'arroger le droit de dire qui est l'autre et de le réduire au silence.

      • Recherches Palestiniennes (Rana Barakat, Diana Allan, Mohamed El Kourd, Nasser Abouour) : Il s'agit de dépasser les analyses extérieures pour faire entendre les voix palestiniennes, refuser la déshumanisation et la censure, et affirmer leur existence au-delà de l'oppression ("writing Palestine studies").

      L'exemple de Nasser Abouour, qui fait du mur de sa cellule son compagnon et affirme ainsi sa liberté, est une illustration ultime de ce refus.

      Conclusion

      Le refus de la violence est un projet multiforme :

      • Il peut être non-violent, à la manière de Gandhi et King, acceptant de subir les coups pour dénoncer l'oppression.

      • Il peut être violent, comme prôné par Douglass et Fanon, en s'attaquant aux agents de l'oppression pour recouvrer liberté et dignité.

      • Il peut être une contestation de la conscription pour les dominants (objecteurs de conscience israéliens) ou une soustraction imaginative à la terreur pour les dominés (civils palestiniens).

      • Il implique de récuser la "division de l'espace moral qui conduit à fermer les yeux sur la brutalité des gouvernements et à blâmer les victimes qui se révoltent contre le contre leur répression".

      • Il s'agit également de rejeter les récits autorisés des vainqueurs qui effacent la version des vaincus, particulièrement dans le cas des peuples autochtones.

      • En somme, il y a "plusieurs demeures dans la maison de la nonviolence", et le cours invite à une "anthropologie du présent" qui explore les complexités de la violence et les responsabilités des sciences sociales face à la violence épistémique qui fait prévaloir la vision des dominants.

    1. émoignage d’un parent d’élèveUn parent explique la façon dont sa fille a procédé pour choisir un métier en tenant compte des avantages et des inconvénients qui pouvaient y être liés. Au départ, elle voulait devenir esthéticienne ou coiffeuse. Pour l’aider à affiner son projet, ce parent a partagé avec elle quelques conseils :Explorer les aspects positifs et négatifs des métiers sous l’angle des formations permettant d’y accéder (par exemple la distance par rapport à l’établissement de formation) et sous celui des contraintes associées (par exemple la station débout dans le métier de coiffeuse) ;Réaliser son stage de 3e dans l’environnement de l’un de ces deux métiers afin d’observer concrètement ses réalités ;Se projeter au-delà de ses 16 ans sur ce que pourraient engendrer ces métiers (par exemple répercussions de la station debout en prenant de l’âge).Un parent d‘élève de l’académie d’Aix-Marseille (13).
    2. Témoignage d’un parent d’élèveUn père explique la manière dont sa fille a vécu son stage de 3e. Ce stage fut réalisé dans l’entreprise de sa mère. Elle a pu y découvrir l’ensemble des métiers. Elle ne s’attendait pas à en trouver autant. Cette expérience fut très riche en découvertes et elle a su s’investir pour en explorer l’étendue.Un parent d’élève de l’académie de Versailles (78).
    1. Témoignage de plusieurs lycéens sur leur sentiment d’avoir suffisamment d’informationssur leurs possibilités de poursuite post-bac« Personnellement, j’ai dû aller au CIO avec mon père en dehors du lycée pourme renseigner sur Sciences Po ; au lycée, on n’est pas assez informé, même s’il y ades brochures au CDI, etc. »« Ce qui me dérange, c’est qu’on est un peu dans le flou, il y a tropd’informations et je trouve qu’il y a trop de possibilités après le bac. J’aurais aiméque ce soit beaucoup plus concentré parce qu’il y a beaucoup trop de trucs. »
    2. Témoignage de Laure, élève de terminale spécialité ISN (informatique et sciences dunumérique), à propos de son parcours présent et à venir« Pour ma terminale, j’ai choisi informatique et sciences du numérique, donc ISN.C’est une spécialité en rapport avec tout ce qui est numérique et tout ce qu’onpeut y faire. J’ai choisi ISN parce qu’en fait, l’informatique, ça me passionne depuistoujours et puis je voulais apprendre à programmer, c’est ce qu’on a fait. J’aitoujours voulu me lancer, mais je n’ai jamais osé toute seule parce que c’estvraiment quelque chose que je trouvais compliqué. Je n’ai pas vraiment encored’idée de métier, mais je pense qu’après mon DUT, je vais faire une licence etje vais continuer dans la voie de l’informatique et sûrement programmer. C’est çaqui m’intéresse, faire ce que j’aime, donc programmer. »
    1. The matter admitteth those particular formes materially, and withall obli erateth or How the re­ception of formes differs in the first matter, and in the soule. blotteth out the contrary forme whereof it was before possessed: the soule of man receiues and entertaines the generall and vniuersall notions of things, free from all contagion or touch of Matter, not abolishing the contrary, or diuers formes whereof before it was posses­sed. This alone is incorporeall, immortall, [...] or immutable. This may be called the receptacle, promptuary, or store-house of all the species or kinds of things.

      The soul receives the notions of things. Incorporeal, immutable: promptuary of all the kinds of things.

    1. s aptos para la aplicación y generación de conocimientos que les proporcionen las habilidades para la solución de problemas, con pensamiento crítico, sentido ético, actitudes emprendedoras, de innovación y capacidad creativa para la incorporación de los avances científicos y tecnológicos que contribuyan al desarrollo nacional y regional.

      ¿El TECNM capacita a sus maestros para formar profesionistas con esas características?

    1. limited function approximator expressiveness. In particular, neural networks are only universal approximators as their size goes to infinity.

      As I understand it, models are just an approximation. Unless they are infinitely large.

    1. complexités de la définition et de la qualification de la violence, en s'appuyant sur des exemples variés allant des violences domestiques aux conflits internationaux. Il met en lumière le caractère socialement construit de la violence, les enjeux moraux, légaux et politiques de sa reconnaissance et de sa caractérisation, et les dynamiques de pouvoir qui sous-tendent ces processus.

      1. La Nature Problématique de la Définition de la Violence

      La violence n'a pas de définition simple ou de délimitation précise. Sa reconnaissance dépend d'une qualification sociale, morale et légale.

      • Qualification sociale et historique: Ce qui est considéré comme violent évolue. Par exemple, "pendant des siècles, frapper son enfant lorsqu'on estimait qu'il avait commis une faute c'était le corriger [...] et puis à la fin du 19e siècle la chose est devenue répréhensible et punissable".
      • Dimensions multiples: La qualification de violence implique des dimensions "morale" (jugement social) et "légale" (jugement pénal). Dans des contextes de belligérance ou de pouvoir, une "dimension politique" s'ajoute, notamment pour nommer des violences "policières" ou des auteurs "terroristes".
      • Deux ordres de qualification:Reconnaissance (premier ordre): Établir si un acte est violent. Les auteurs présumés et leurs avocats tentent souvent de "requalification des faits" en les minimisant, justifiant ou excusant.
      • Caractérisation (second ordre): Une fois la violence reconnue, lui adjoindre un adjectif (ex: "violences domestiques", "violences sexuelles") ou la qualifier au regard du droit international (ex: "crime de guerre", "crime contre l'humanité", "génocide").

      2. La Violence perçue de l'Intérieur vs. de l'Extérieur : Le cas des mutilations génitales féminines

      Certaines pratiques considérées comme violentes par un regard extérieur ne le sont pas par la majorité des membres de la société qui les pratique.

      • L'excision au Soudan: L'anthropologue Janice Boddy, dans son livre Civilizing Women (2007), a étudié l'excision au Nord Soudan. Elle constate que, bien que douloureuse, "l'excision était attendue avec impatience et c'était la perspective de ne pas en faire l'expérience qui était redoutée". Elle produisait une "féminisation génitale idéalisée en terme de propreté et de pureté" et s'inscrivait dans une "esthétique morale" de différenciation genrée.
      • Relativisation historique et culturelle: Boddy invite à une comparaison avec des pratiques occidentales:
      • Au XIXe siècle, l'excision était pratiquée en Europe et en Amérique du Nord par la profession médicale pour soigner divers désordres (insomnie, stérilité, troubles psychologiques, etc.).
      • Aujourd'hui, la chirurgie génitale féminine à visée esthétique ("rajeunissement génital") est en augmentation dans le monde occidental, motivée par "les images de playmet circulant sur internet" et présentée comme une forme d'émancipation, "comme si dans ces deux derniers cas on pouvait faire fi de la domination masculine qu'elle manifeste y compris parmi les femmes par un effet de violence symbolique".
      • Domination masculine et violence symbolique: L'auteur souligne que ces pratiques, qu'elles soient traditionnelles ou modernes, peuvent être vues comme des manifestations de la domination masculine, ajoutant une "violence symbolique" à la violence physique.

      3. Le Rejet de la Souffrance : Le Rituel de "Kagnalen" en Casamance (Sénégal)

      Même si la violence n'est pas "nommée" comme telle, la souffrance qu'elle occasionne est souvent perçue par les victimes.

      • Le Kagnalen: Ce rituel Diola impose aux femmes présumées incapables d'assurer leur fonction procréatrice (stérilité, avortements répétés, décès d'enfants) un exil forcé, un changement d'identité souvent dépréciatif ("chienne qui ne retient pas le sperme"), un accoutrement ridicule, et des "tâches spécialement épuisantes" et "pratiques particulièrement dégradantes".
      • Souffrance reconnue: Malgré l'intégration du rituel dans une "division du travail social" qui rend les femmes responsables de la reproduction biologique, les femmes ayant subi le Kagnalen évoquent un "souvenir douloureux avec une profonde émotion ne pouvant retenir leur larmes" en mentionnant "la cérémonie initiale et la flagélation les années de mortification et d'avilissement l'isolement affectif et le labeur exténuant la peur continuelle des réprimandes et des sanctions".
      • Évitement et résistance: Certaines jeunes femmes tentent d'éviter le rituel en s'installant en milieu urbain ou en préférant les soins médicaux, "cherchaient à rompre le cercle de la violence symbolique dans lequel leur société tendait à les enfermer".

      4. La Violence non dite et les dynamiques de pouvoir

      La non-qualification de la violence dans l'espace public ne signifie pas que les victimes ne la reconnaissent pas.

      Reconnaissance implicite: "une chose peut exister à la fois dans le monde et dans la conscience qu'en ont les agents même s'il ne la nomme pas comme telle". Les femmes victimes de violences sexuelles reconnaissaient la "contrainte sur corps" sans se référer à une définition légale. Stratégies face à la violence reconnue: Les victimes peuvent "s'efforcer de l'éviter" (départ en ville), "essayer de la combattre" (campagnes de protestation) ou "s'en accommoder" (force de la tradition, coût trop élevé de la rupture), selon le modèle "Exit, Voice, and Loyalty" d'Albert Hirschman.

      5. La Violence d'État et le déni de la qualification

      La qualification de la violence est un enjeu majeur dans le cas de la violence d'État, où l'institution tente de l'occulter.

      • Monopole de la violence légitime: L'État revendique le "monopole de la violence légitime" (Max Weber). La différence réside entre "l'usage fondé de la force et le recours inapproprié à la violence".
      • Modalités d'évitement de la qualification des violences policières:Pression sur la victime pour ne pas porter plainte: Menaces ("il avait des petits frères qui avaient déjà fait des bêtises et que si lui portait plainte ce serait eux qui auraient des ennuis").
      • Contre-plainte: Accusations d'"outrage et rébellion contre agent dépositaire de l'autorité publique", qui servent souvent "un moyen de couvrir leur brutalité en la présentant comme un usage nécessaire de la force".
      • Justification: Extension légale de l'usage des armes par les forces de l'ordre, comme la loi de 2017 en France, pouvant aller jusqu'à une "autorisation de tuer au nom de la garantie de sécurité". Cette loi a entraîné un "quintuplement des tirs mortels pour refus d'obtempérer".
      • Définition de la violence d'État: Non seulement des violences commises par une institution agissant par délégation de l'État, mais aussi le fait que "l'État contribue à leur occultation en récusant leur existence en couvrant les déviances en apportant son soutien aux mises en cause [...] et en épargnant à l'inverse les auteurs par les réquisitions des parquets et les pressions sur les juges".
      • Paradoxe de la qualification: "c'est parce qu'il y a flagrant déni par l'État de la violence perpétrée par ses représentants et sous son commandement qu'on peut parler donc de violence d'état".

      6. Le Conflit des Interprétations dans les Conflits Internationaux : Israël-Palestine

      Les conflits internationaux sont aussi des "conflits d'interprétation" des faits, où les mots ont un poids politique et moral considérable.

      • L'herméneutique des discours: En s'appuyant sur Paul Ricœur, l'auteur propose une "herméneutique des discours", c'est-à-dire une "interprétation des interprétations", pour déchiffrer le "sens caché derrière le sens apparent".
      • L'attaque du 7 octobre 2023 : Deux interprétations radicales:Interprétation dominante occidentale (Israël et ses alliés): "acte d'antisémitisme", "le plus grand massacre antisémite de notre siècle", comparé à un "pogrome" ou la Shoah.
      • Enjeux et implications:Place les actes au "sommet de l'échelle des crimes".
      • Justifie "l'intensité de la réponse punitive à Gaza" et le "droit inconditionnel à se défendre" d'Israël.
      • Fragilise les accusations de crimes de guerre contre Israël, s'agissant d'éliminer une "organisation terroriste".
      • "Écarte toute possibilité de référence à ce qui s'est passé avant sa survenue et oblitèrent ainsi l'histoire de la Palestine".
      • Interprétation alternative (Hamas, pays du Sud, certains observateurs): "acte de résistance" inscrit dans une "guerre asymétrique" et une "séquence historique longue" de dépossession des Palestiniens depuis la Nakba (1948) et l'occupation des territoires (1967).
      • Enjeux et implications:Rappelle qu'il s'agit d'une "guerre dans le laquelle un état puissant assujetti un peuple", non d'une minorité persécutée.
      • Souligne "la passivité de la communauté internationale" et sa "complicité" face aux violations du droit international par Israël, privant les Palestiniens d'alternatives à la violence.
      • Permet d'appréhender le sens de l'action des acteurs à la lumière des "luttes contre domination étrangères".
      • Reconnaît "le poids de l'histoire" face aux "pratiques d'effacement des violences subies par les Palestiniens".
      • La qualification de génocide à Gaza:Arguments en faveur: Basée sur la Convention pour la prévention et la répression du crime de génocide (1948), citant "l'intention de détruire ou tout ou en partie un groupe national ethnique racial ou religieux comme tel" et les "quatre premières modalités" (nombre de morts, blessés, conditions de vie, entrave aux naissances). La "Cour internationale de justice [...] a affirmé le caractère plausible de la demande de prévention de la commission d'un génocide".
      • Arguments contre: "une nation composée en partie à sa création de rescapé d'un génocide ne peut pas être elle-même coupable de génocide". Les morts seraient des "victimes collatérales". La contestation se joue "sur le terrain du droit [...] mais également dans le cadre des rapports de force internationaux".

      Conclusion Générale

      La reconnaissance et la qualification de la violence sont des processus sociaux, complexes et souvent conflictuels.

      • Décalage entre victimes et auteurs: Les victimes "identifient bien la violence qu'elles subissent", tandis que les auteurs "soit la minimisent soit la justifie soit la conteste".
      • Enjeux de pouvoir: La "bataille se joue dans l'espace public [...] autant que sur le terrain légal", et est toujours "soutendues par des enjeux politiques".
      • Implications politiques et morales: La manière dont la violence est nommée et interprétée a des "conséquences presque opposées" sur la résolution des conflits, pouvant soit "disqualifier l'adversaire" et "exclure toute autre issue que son élimination", soit "ouvrir la possibilité d'une reconnaissance de tort et leur réparation".
    1. Yo arranco con una imagen y empiezo a explorarla durante la escritura y de a poco voy entendiendo qué es lo que esa imagen quiere

      Proceso de escritura de Trias

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1.1. It would be helpful if the authors could discuss whether there is any correlation between cryptic sites and the extent of experimental validation in the Phosphosite database (e.g. those that were only identified in one or a few MS experiments). It is difficult to determine stoichiometry of phosphorylation experimentally, but can any inference be made on the extent of phosphorylation of cryptic sites vs. more conventional sites located in IDRs or on the surface of globular domains?

      We thank the reviewer for this valuable suggestion. To investigate the extent of the experimental validation of phosphosites, we examined the number of supporting studies for each site reported in the PhosphoSitePlus database. Specifically, we summed the values of the LT_LIT (literature-based experiments), MS_LIT (mass spectrometry literature), and MS_CST (Cell Signaling Technology mass spectrometry) fields to count the number of independent studies supporting each phosphorylation site, either cryptic or non-cryptic. To visualize the results, we plotted the number of supporting references vs the relative solvent accessibility (RSA) distribution of phosphosites (Figure R1). The analysis revealed a direct correlation between the RSA of phosphosites and the number of studies supporting their phosphorylation. This observation may arise from an intrinsic difficulty in studying cryptic phosphosites due to their destabilizing effects on native proteins. Notably, no differences were observed in the number of supporting studies within cryptic phosphosites (Figure R1B). We have not mentioned these analyses in the new version of the manuscript. However, we would gladly add it if the editor or the reviewer advises accordingly.

      1.2. The authors note that a larger percentage of tyrosine phosphorylation sites are cryptic compared with serine/threonine sites. I assume that tyrosine itself is more highly enriched in the hydrophobic cores of proteins relative to serine or threonine, due to its bulky hydrophobic side chain. Is the increased proportion of cryptic tyrosine phosphorylation sites more, less, or the same as the proportion of tyrosine in hydrophobic cores relative to serine and threonine?

      We thank the reviewer for this insightful comment. As correctly noted, tyrosine residues tend to be enriched in the hydrophobic cores of proteins, as reflected by their generally lower relative solvent accessibility (RSA) values, regardless of phosphorylation state. This enrichment is likely due to the tyrosine side chain's bulky and partially hydrophobic nature. To address the reviewer's question, we compared the RSA distributions of phosphorylated tyrosine, serine, and threonine residues with that of the same residues non-phosphorylated in the human proteome (Figure R2). In order to statistically compare the two distributions, we employed the Mann-Whitney test. The large sample size inevitably yields very low p-values, even when the distributions differ mildly (pThr, pSer vs non-p Thr, Ser, p 1.3. Fig. 5D and E: I had some trouble interpreting these figures. Indicating where the native state is in the plots would be helpful (stated in text as lower right, but a rectangle on the plot would make this more obvious). The text discusses three metastable intermediates, but what is the fourth one shown on the figures (well A, close to the native state)? This could be more explicitly explained.

      We added the missing rectangles into the original Fig. 5D and E (see below Figure R3 and R4). The three metastable intermediates discussed in the original text reflect protein conformers in which the cryptic site is exposed to the solvent. Conversely, the fourth state, and the final native state, are conformations in which the site is already partially or fully cryptic. The observation that the masking of cryptic sites coincides with the latest folding steps allows us to hypothesize a mechanism by which cryptic phosphorylation may regulate protein folding. Following the reviewer's suggestion, we now specify more explicitly each conformation in the new version of the legends of the relative figures (text file with track changes, lines 950 and 1017).

      1.4. The fact that phosphomimetic mutations of cyptic sites in SMAD2 and CHK1 lead to lower expression levels and shorter half-lives is not surprising, given the expected disruption of the hydrophobic core by introduction of a charged residue. The results certainly show that if phosphorylated, these sites would decrease expression and half-life. With respect to half-life, however, if the authors are correct and cryptic sites are predominately phosphorylated co-translationally, one would expect that the half-life curves for the wt protein would not be a simple exponential, but would instead reflect two distinct populations: those that are phosphorylated during translation, and are almost immediately degraded, and those that escape phosphorylation and have the same half-life as the non-phosphorylatable mutant. Are the actual experimental results consistent with this two-population model? If not, this would be evidence that some of these cryptic sites can be exposed post-translation, either by thermal fluctuation or biological interactions.

      We thank the reviewer for this insightful point. The readout employed in our study (i.e., western blotting) measures the aggregate signal from the total protein population in the cell culture. It thus reflects average protein levels rather than the dynamics of individual molecules. As such, it is not well-suited to resolving coexisting populations with distinct half-lives. We agree that if phosphorylation of cryptic sites occurs strictly co-translationally, one might expect a biphasic decay curve. However, due to methodological constraints, our assay provides only a single exponential fit to the global turnover kinetics. While we cannot entirely exclude the possibility that cryptic sites may become exposed post-translationally (e.g., due to thermal fluctuations or interactions), our molecular dynamics simulations did not reveal such exposure events within the simulated timescales. Therefore, while the two-population model remains plausible in principle, our results are consistent with a co-translational phosphorylation and degradation model. Forthcoming experiments aimed at characterizing the phosphorylation of ribosome-associated nascent chains in the human proteome may further validate this conclusion.

      1.5. The authors make a point that cryptic phosphosites are more highly conserved than non-cryptic phosphosites, but it is not clear to me whether it is the side chain itself or its ability to be phosphorylated that is conserved. Supplemental Fig. 9, if I am interpreting it correctly, would suggest it is the residue itself and not its phosphorylation that is conserved. If so, wouldn't this suggest that phosphorylation of these cryptic sites is just an inevitable consequence of the conservation of serine, threonine, and tyrosine residues in hydrophobic core regions? If the authors have evidence that argues against this simple hypothesis, they should discuss it (e.g., cryptic phosphosites are more highly conserved in some cases than non-phosphorylated tyrosine, serine, and threonine residues that are not solvent accessible).

      We agree with the reviewer's interpretation. The higher conservation of cryptic phosphosites likely reflects the evolutionary constraint on hydrophobic core residues, which tend to be more conserved due to their role in structural stability. This conservation does not imply phosphorylation at those sites is functionally selected across species. Instead, when such residues are phosphorylated, as we observe in the human proteome, the effect is often destabilizing and associated with protein degradation. Our analysis does not establish that the phosphorylation of cryptic residues is conserved across species, only that the residues themselves are. We appreciate the reviewer's suggestion and now explicitly discuss this point in the revised manuscript to clarify the distinction between residue conservation and phosphorylation conservation (text file with track changes, line 618)

      1.6. Regarding the evolutionary conservation of cryptic sites, have the authors taken into consideration that tyrosine-specific kinases, phosphatases, and reader domains first appeared in the first metazoans, and are for the most part not seen in non-metazoan eukaryotes? I notice some of the proteomes used for the conservation analysis include plants and yeast, which lack most tyrosine phosphorylation.

      We thank the reviewer for this insightful comment. In response to the suggestion, we have recalculated the entropic conservation score by restricting the analysis to metazoan species. This analysis ensures that the evolutionary context more accurately reflects the presence and functional relevance of tyrosine-specific kinases, phosphatases, and reader domains. The comparison between the entropic score distribution calculated by including or not non-metazoan orthologues show statistically significant differences for both serine and threonine, and tyrosine. However, the large sample sizes translate inevitably into statistically significant p-values, even when the differences in mean are minimal and the standard deviations relatively small. To better assess the practical relevance of these differences, we calculated Cohen's d as a measure of effect size (Table R1). The coefficient helps assess the size and biological significance of a difference (>0.2 = small effect; >0.5 = medium effect; >0.8 = large effect). The analysis indicates a very modest deviation in entropic scores by including or not non-metazoan orthologues.

      1.7. I find the argument that phosphorylation of exposed core residues is part of normal protein quality control/proteostasis to be convincing. Can the authors provide any experimental evidence to support this model (for example, greater phosphorylation of cryptic sites under stress conditions)? I don't think these experiments are necessary, but would seem to be a logical next step and could be done quite easily through collaboration.

      We appreciate the reviewer's suggestion and fully agree that showing more significant phosphorylation of cryptic sites under stress conditions could represent an exciting future direction. We are conducting experiments on individual tumor suppressors such as p53 and PTEN, which harbor cryptic phosphosites, to test whether cellular stress conditions enhance phosphorylation at these positions. These studies assess whether such modifications contribute to altered protein stability or function in stress or disease contexts, particularly cancer. We plan to communicate these results in forthcoming publications and are currently open to collaborations to broaden this line of investigation.

      1.8. The authors note at the end of the discussion that targeting cryptic phosphosites might be a strategy to selectively degrade some proteins in cancer. Practically, how would this work? I can't think of how, but perhaps the authors can provide more specific suggestions.

      We thank the reviewer for raising this important point. One promising approach to therapeutically exploit cryptic phosphosites builds on the PPI-FIT principles (Pharmacological Protein Inactivation by Folding Intermediate Targeting). This strategy targets transient structural pockets appearing only in folding intermediates (Spagnolli et al., Comm Biology 2021). In this context, kinases that phosphorylate cryptic sites could be modulated, either inhibited or redirected, so that misfolded or oncogenic proteins are selectively marked for degradation. For example, selectively enhancing the phosphorylation of a cryptic site on an oncogenic protein could destabilize it and promote its degradation via the proteasome. Conversely, preventing phosphorylation at a cryptic site on a tumor suppressor (e.g., by inhibiting the specific kinase) could enhance protein stability and restore function. While this concept is still emerging, it offers an exciting therapeutic avenue that complements our findings. We added a paragraph addressing this point in the discussion section of the new version of the manuscript (text file with track changes, line 716).

      1.9. Introduction: "It involves the addition of a phosphate to an hydroxyl group found in the side chain of specific amino acids, typically serine, threonine or tyrosine residues." Of course serine, threonine, and tyrosine are the only standard amino acids with a simple hydroxyl group, so "typically" is not needed here.

      We have removed the word "typically" to reflect the accurate chemical specificity of phosphorylation events (text file with track changes, line 82).

      1.10. In my view this is an important study, bringing rigor and a broad proteomic perspective to a phenomenon that (to my knowledge) had not been carefully examined previously. In terms of the big picture, I am of two minds. On the one hand, showing that phosphorylation of hydrophobic core residues exposed during translation or the early stages of folding can regulate steady state levels of some proteins provides an intriguing new mechanism to control the complement of proteins in the cell, and is potentially an area of regulation in normal physiology or in disease. On the other hand, if this is just part of the normal proteostatic mechanisms (hydrophobic core residues exposed for too long consign the protein to degradation, before it can lead to aggregation and other problems), that is a little less interesting to me. I think future work to tease out whether this mechanism is actually regulated and used by the cell to transmit information will be key. But the first step is showing that the phenomenon is real and widespread, and in my view this preprint accomplishes that goal very well.

      We appreciate the reviewer's thoughtful summary and agree that distinguishing between passive proteostatic clearance and active regulatory function is essential. Toward this goal, we plan to carry out a phosphoproteomic analysis of ribosome-associated nascent chains. By mapping phosphorylation events during translation, we aim to validate our cryptic phosphosite dataset in a co-translational context and potentially identify novel regulatory modifications. This approach will also help us assess whether phosphorylation at cryptic sites is modulated context-dependently, thereby supporting a role in regulated protein expression rather than solely quality control.

      2.1. Evolutionary comparison whether cryptic and non-cryptic sites are differently conserved. Two distinct distributions for cryptic and non-cryptic phospho-sites are observed and Figure 6 shows two entropy distributions of cryptic v non-cryptic. Here it is unclear whether this is significant given the different distributions of the two types when non modified.

      We thank the reviewer for raising this critical point. Due to the large sample sizes in our analysis, statistical tests inevitably yield very low p-values, even when differences in mean are minimal and the standard deviations relatively small. To better assess the practical relevance of these differences, we calculated Cohen's d as a measure of effect size (Table R2). The comparison between cryptic and non-cryptic phosphosites yielded an effect size (Cohen's d = 0.4028) slightly lower than the one obtained for residues lying within protein cores or exposed on protein surfaces (Cohen's d = 0.5126), both indicating a modest but meaningful shift in entropic scores. In contrast, the comparisons between cryptic phosphosites and all core residues, as well as non-cryptic phosphosites and all surface residues, showed negligible effect sizes (Cohen's d = 0.0245 and 0.1326, respectively). These findings suggest that while statistical significance is achieved in all cases, only the difference between cryptic and non-cryptic phosphosites, or core and surface residues, reflects a meaningful biological signal. We have now included these data in the new version of the manuscript (text file with track changes, line 544).

      2.2. The identification of buried modification sites and what the biological meaning / implications are is a very interesting topic. However PTM distribution on proteins is very skewed (many papers have identified ____clusters, hot spots, structural dependencies etc...) and therefore comparing modified sites on different residues and in different protein regions and with non-modified residues has to be very stringently controlled.

      We fully agree with the reviewer that PTM distribution is non-random and influenced by structural and functional constraints, making comparative analyses challenging. To ensure rigor, we implemented a robust computational pipeline. Unlike other PTMs found almost exclusively on solvent-exposed residues, phosphorylation uniquely showed a distinct subset of sites with extremely low solvent accessibility. This pattern held even after applying stringent structural and dynamical filters. Specifically, we excluded low-confidence residues, small or unstructured domains, and sites that become exposed due to thermal fluctuations, using the SPECTRUS-based dynamic analysis. While we cannot entirely rule out context-specific exposure in fully folded proteins (e.g., during protein-protein interactions), we validated selected cryptic sites experimentally, and our findings were consistent with the computational predictions. We believe this multilayered approach strengthens the reliability of our classification and distinguishes cryptic phosphosites from the broader PTM landscape.

      2.3. Very basic question: How do you assessed the RSA value of the residues from the alphafold structure. If it is sequence based, then it is unclear what the alpha fold structure actually contributes in this step? Although I assume it is structure based, it is not well described, only a reference.

      We calculated the RSA values using the Shrake-Rupley algorithm implemented in the MDTraj Python library. This is a structure-based metric: for each PTM-carrying residue, we evaluated the absolute SASA from the 3D AlphaFold structure and normalized it against the theoretical maximum exposure for that residue in a Gly-X-Gly tripeptide, as defined in Tien et al. (2013). Thus, AlphaFold structures directly provide the atomic coordinates necessary for solvent accessibility estimation. We have now revised the Methods section to describe this process more explicitly (text file with track changes, lines 110 and 113).

      2.4. Given that the different residues S,T,Y but also K for glycosylations etc. have a very different baseline RSA distribution, the distributions of modified residues as such are not so informative. Are the distributions of residues with the alpha fold LOD 0.65 different between modified and non-modified?

      2.5. Same point: it is very clear that "tyrosine presenting a larger proportion of cryptic phosphor-sites", as they mainly are within folded domains to begin with. The pattern of phosphorylation and clustering is very different between the modified amino acid residue T,S,Y and needs consideration, given the large number of PTMs, a simple distribution is not sufficient to argue.

      As already discussed in point 1.2 above, and correctly noted also by this reviewer, tyrosine residues are generally enriched in the hydrophobic cores of proteins, which is reflected by their typically low RSA, regardless of phosphorylation status. This tendency likely arises from the bulky and partially hydrophobic nature of the tyrosine side chain. To address the reviewer's question, we compared the RSA distributions of phosphorylated tyrosine, serine, and threonine residues with those of all these amino acids in the human proteome. We found that phosphorylated residues consistently exhibit higher RSA values than the overall averages for their respective amino acids. This is expected, as phosphorylation within protein cores would likely be destabilizing. Indeed, the existence of low-RSA phosphorylated residues, represents a significant deviation from the intrinsic tendency of tyrosine, serine, and threonine residues and suggests that cryptic sites may become accessible only transiently along protein folding pathways.

      2.6. Figure 3E (proteins need names in the figure ): the cryptic site T222 (Chk1) is not in the quasi ridged domain, it is in a light color region. What is actually the SPECTRUS cutoff? The Pidc is only one sentence in the main text? It says fewer than 80% intradomain contacts in rigid domains i.e. >0.8, right, but is the domain rigid?

      We have revised the original figure in the new version of the manuscript to include protein names, and clarified the domain assignments. The cryptic phosphosite T222 in Chk1 lies within a quasi-rigid domain, as identified by SPECTRUS. The color of the image does not reflect any structural property but instead it is used to distinguish different quasi-rigid domains. In particular, black regions identify unstructured domains, whereas shadows from dark grey to white identify quasi rigid domains. We apologize for the lack of clarity. We have corrected the figure legend accordingly (text file with track changes, line 912).

      There is no cutoff in SPECTRUS' identification of quasi-rigid domain. Non quasi-rigid domains are simply regions of the protein that SPECTRUS cannot process properly. Meaning regions that, due to the large degree of intrinsic fluctuations, cannot be modelled as quasi-rigid.

      We also expanded the description of Pidc in the main text to clarify that it quantifies the proportion of intra-domain contacts made by the phosphosite's side chain, and that a cutoff of {greater than or equal to}0.8 was used to retain only residues well-integrated within rigid domains (text file with track changes, line 243).

      We hope these updates will resolve the ambiguities noted and more clearly define the criteria used in our filtering pipeline.

      2.7. The evolutionary comparison (which is not my core expertise), seems again like comparing different things. Why not comparing cryptic and non-cryptic sites in the same protein regions? Also p-Y are, evolutionarily speaking, very different to p-S and p-T. How is this possibly considered in one distribution. p-Y analysis needs to be separated from the p-T and p-S analyses here.

      We want to clarify that our evolutionary analyses compare residues at the aligned positions in orthologous proteins across multiple species. This approach ensures that each cryptic or non-cryptic phosphosites is assessed in its native structural and sequence context. Therefore, the comparison is not between different regions but evaluates the evolutionary conservation of specific sites across species, allowing for a direct and meaningful comparison of cryptic and non-cryptic phosphosites. In order to address the second point, we report below the entropic score distributions for serine/threonine and tyrosine, separately (Figure R5).

      2.8. Have the authors thought of randomization of their data to see whether the distributions are significant?

      We are unsure we fully understand what the referee means by randomizing the data in this case.

      However, according to the mathematical definition of entropic score, the limit case in which, within each orthogroup, the phosphorylated amino acid is replaced by a completely random residue yields an entropic score of 1. The opposite limit, in which all members of the orthogroups have the same amino acid in the position of the phosphorylated amino acid, yields an ES of 0. We have added a paragraph in the methods to stress this point (text file with track changes, line 354).

      2.9. Labeling in Suppl Figures is insufficient. E.g. In S6 what are the various WT, A and D numbering, are this independent stable transfections/clones? Figure S7 what is R? Thank you for pointing this out. We have now corrected the missing information in the revised version of the manuscript (text file with track changes, from line 992 to 1008)

      2.10. Whether or not findings are "impressive" should be up to the reader, please remove these attributes in the text.

      We agree with the reviewer's suggestion. We have removed subjective language such as "impressive" from the revised manuscript to ensure an objective and neutral tone, allowing readers to independently evaluate the significance of our findings (text file with track changes, line 454).

      3.1. Residues with pLDDT scores below 65 were excluded from the analysis. The high-confidence measure applies to individual residues, regardless of whether the domains they belong to are also predicted with high confidence. Identifying the number of domains containing PTMs with overall high-confidence predictions could provide better insights into the orientation of modified residues within domain structures. To assess the relationship between residue-specific confidence and domain stability, we can analyze the correlation between high-confidence modified residues and the overall prediction accuracy of their domains. This could be quantified using the average error scores of domain residues. Additionally, using the average pLDDT score would indicate how many individual residues were predicted with high local structural confidence. In contrast, the average PAE (Predicted Aligned Error) score would provide insights into how well each residue's position is predicted relative to others within the domain, reflecting overall domain structural confidence.

      Our analysis excluded residues with pLDDT scores below 65 to ensure high local confidence. While pLDDT provides residue-level structural confidence, assessing domain-wide prediction quality offers additional insights into modified residues' spatial organization and exposure. However, a domain-level interpretation is currently limited by the format of AlphaFold structural predictions. Specifically, AlphaFold does not provide Predicted Aligned Error (PAE) matrices for sequences split into overlapping fragments, a method used for proteins longer than 2,700 amino acids. These fragment predictions are only available in the downloadable AlphaFold proteome archives, not through the web interface, and lack the global alignment metrics (such as PAE) necessary for analyzing domain stability or inter-residue confidence within the domain context.

      3.2. "Approximately 65% of proteins with cryptic phosphosites contained only one or two such residues, while less than 10% had five or more sites (Supp. Figure 3)." To better interpret this trend, it would be useful to analyze the total number of cryptic PTMs on proteins part of this study, including all modification types-not just phosphorylation. This would help determine whether the observed pattern is specific to phosphorylation or if it extends to other post-translational modifications as well.

      To compare the occurrence of different cryptic PTMs, we extended our analysis to include all cryptic post-translational modifications annotated in PhosphoSitePlus, including phosphorylation, glycosylation, methylation, sumoylation, and ubiquitination. The approach allowed us to assess whether the observed distribution of cryptic phosphosites is unique or represents a more general feature of all cryptic PTMs. We observed extensive variation among the different PTMs in the proportion of proteins carrying 1, 2, or more of the same cryptic PTM (see Table R3). However, it must be noted that the relatively low number of cryptic PTMs, excluding phosphorylation, could make it difficult to determine whether these patterns reflect actual biological trends or are simply influenced by the sample size. We have not included these data in the new version of the manuscript, but we would be willing to add them if the editor or the reviewer advises us accordingly.

      3.3. For the validation of cryptic sites, selecting domains under 200 amino acids was mentioned. However, was there also a minimum length threshold applied, similar to the filtering criteria used for false positives (less than 40 ignored)?

      The 40-residue threshold was applied because protein domains that are too small cannot be reliably subdivided into quasi-rigid domains. Trying to run SPECTRUS on structures with fewer than 40 residues inevitably returns a warning, reflecting the intrinsic cooperative nature of quasi-rigid domains. In fact, entities composed of too few amino acids cannot properly arrange themselves into 3D structures and tend to be disordered. The same reasoning was applied when choosing the proteins to simulate. In particular, for the refolding simulations, we selected protein domains possessing the following properties:

      1. Shorter than 200 amino acids to limit the computational demands.
      2. Long enough to fold into an ordered 3-dimensional conformation reliably.
      3. Have an experimentally determined NMR or X-ray crystal structure 3.4. To test their hypothesis that phosphorylation affects protein expression, they selected candidates for serine and threonine but excluded tyrosine. What were the reasons for not including tyrosine-related PTMs in their analysis?

      Our experimental assays relied on phosphomimetic substitutions to mimic the effect of phosphorylation. While serine/threonine phosphorylation can be reasonably mimicked by E or D substitutions, there is no reliable single-residue mimic for phosphotyrosine. Indeed, E or D substitutions do not recapitulate the structural or electronic features of pTyr. Given these limitations, we excluded tyrosine phosphosites from experimental validation to avoid generating inconclusive or misleading data.

      3.5. Do we know that the regulatory role of S300 on PYST1 is associated with the dual specificity of the phosphatase, and is this why it was selected as a negative regulator? While the regulatory roles of the other analyzed phosphosites on SMAD and CHK1 are discussed, there is limited mention of the specific role of S300 on PYST1 within the scope of the study.

      S300 of PYST1 was selected not due to known regulatory relevance, but for technical convenience. PYST1 is a relatively small protein, facilitating computational simulations. We also had suitable reagents for detection (i.e., expression vector), and importantly, S300 was identified as a false-positive cryptic phosphosite removed by our dynamic filtering. It was a practical and structurally matched negative control for validating our computational pipeline.

      3.6. When comparing the entropic scores between cryptic and non-cryptic residues, the medians are 0.43 and 0.52, respectively. Although this difference is not very high, they do observe that cryptic residues have lower scores than non-cryptic ones. The distributions also show greater overlap (Figure 6). I'm wondering if any statistical testing would help assess how distinct these two groups really are.

      We thank the reviewer for the comment raised by reviewer #2, for which we provide an answer above. Briefly, given our large sample sizes, statistical tests often yield very low p-values even for minor differences. To assess the biological significance, we calculated Cohen's d (Table R2 above). The effect size between cryptic and non-cryptic phosphosites (d = 0.4028) was modest but meaningful, and slightly lower than between core and surface residues (d = 0.5126).

      3.7. Why did the authors choose to rely on AlphaFold data instead of examining PDB structures? I didn't see any explanation or rationale provided for preferring AlphaFold predictions over experimentally determined structures from the PDB.

      We appreciate the value of this comment. We focused on AlphaFold to maximize proteome-wide coverage. Indeed, although PDB structures offer experimentally validated conformations, their sparse and uneven proteome coverage (particularly for membrane proteins, low-abundance factors, and intrinsically disordered regions) precludes a truly global analysis. AlphaFold2 models, by contrast, deliver accurate, full-length structures for nearly the entire human proteome, enabling unbiased, large-scale mapping of cryptic phosphosites. Nonetheless, we performed the same analysis using high-resolution structures from the Protein Data Bank (PDB). The results were fully consistent with those based on AlphaFold predictions, indicating that our findings are consistent across the two databases (see Figure R6 below).

      3.8. Novelty - The concept that cryptic site modifications can dysregulate signaling in cancer and other diseases is known, but systematically categorizing PTM sites into cryptic and non-cryptic to generate hypotheses for a wide range of identified PTMs remains an underdeveloped approach. This study establishes a framework for classifying PTMs based on their structural accessibility, integrating AlphaFold predictions, molecular dynamics simulations, solvent accessibility analysis, and phylogenetic conservation metrics. This approach not only enhances our understanding of PTM-mediated regulatory mechanisms but also provides a foundation for exploring how cryptic modifications contribute to protein function, stability, and disease progression.

      We appreciate the reviewer's comment. To our knowledge, this is the first study to introduce and define "cryptic phosphosites" as a structurally distinct and functionally relevant subset of phosphorylation sites. While some individual cases of buried amino acids influencing cancer-related proteins have been reported, no previous study has systematically mapped, filtered, and analyzed these sites across the human proteome using integrated structural, dynamical, evolutionary, and experimental criteria.

      3.9. The study relies primarily on predicted protein structures (e.g., AlphaFold), without exploring experimentally derived structures, which could provide more accurate and physiologically relevant insights.

      We have addressed this point above (see reply to #3.7).

      3.10. While the research demonstrates the impact of cryptic PTMs on protein function, it would be valuable to also investigate non-cryptic sites from their annotated data. By examining the effects of modifications on these non-cryptic sites, the study could further validate the importance of the cryptic versus non-cryptic classifications and help clarify the functional relevance of both types of sites.

      We thank the referee for this thoughtful suggestion. We compared the proportion of cryptic or non-cryptic phosphosites associated with cancer- and disease-related mutations in each group from the COSMIC and PTMVar datasets. The percentage of phosphosites associated with the two repositories is essentially the same for cryptic and non-cryptic sites. This observation suggests that, despite their different structural and regulatory features, both site types occur similarly in disease contexts (see Table R4). We have included these data in the new version of the manuscript (text file with track changes, line 1067; and new Supp. Table 3).

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Review on Gasparotto et al "Mapping Cryptic Phosphorylation Sites in the Human Proteome"

      Gasparotte et al assess the solvent accessibility of 87,138 post-translationally modified amino acids in the human proteome (from phosphosite plus). There initial observation is that a large fraction of modified sites are buried, a finding that is pronounced for phosphorylation but not other modifications. Their approach is using alpha fold 3D structures (0.65 cut off) and RSA prediction to get a set of buried sites. Further refinement includes the removing of low-confidence segments (such as loops, linkers, or short disordered regions) and to use SPECTRUS to identified quasi-rigid domains. The idea is that quasi rigid domains may not breathe and thus will be modified during the synthesis or folding.

      They generated a final dataset of 10,606 cryptic T, S and Y phosphor-sites in 5,496 proteins and state that: "These data indicate that ~5% of all known phospho-sites are cryptic. Impressively, the number translates to ~33% of phosphorylated proteins in the human proteome presenting at least one cryptic phospho-site." They focus on S417 of the SMAD2, T382 of Chk1, known to be associated with loss of function effects or proteasomal degradation and S300 of PYST1 negative control. They stably express these proteins as phospho-mimicry or alanine substitution in HEK293. Expression levels were reduced in the phosphor-D- mutant versions and upon cycloheximide treatment a reduction of the turnover time for the phospho-D CHK1 was observed. I think we are looking a large clonal difference in the supplemental figures.

      The examples are supported by MD simulations that suggest that cryptic phospho-sites can occur during the folding process and affect protein homeostasis by drastically increasing degradation rate and leading to rapid turnover; Essentially the phospho-versions show a solvent exposure. Evolutionary comparison whether cryptic and non-cryptic sites are differently conserved. Two distinct distributions for cryptic and non-cryptic phospho-sites are observed and Figure 6 shows two entropy distributions of cryptic v non-cryptic. Here it is unclear whether this is significant given the different distributions of the two types when non modified. Finally, overlay of the sites with cancer mutations lists 221 mutations in COSMIC associated with cryptic phosphosites that have been annotated as cancer-related and 138 mutations in PTMVar linked to cancer and other human pathologies. The identification of buried modification sites and what the biological meaning / implications are is a very interesting topic. However PTM distribution on proteins is very skewed (many papers have identified cluster, hot spots, structural dependencies etc...) and therefore comparing modified sites on different residues and in different protein regions and with non-modified residues has to be very stringently controlled.

      Points for consideration

      • Very basic question: How do you assessed the RSA value of the residues from the alphafold structure. If it is sequence based, then it is unclear what the alpha fold structure actually contributes in this step? Although I assume it is structure based, it is not well described, only a reference.
      • Given that the different residues S,T,Y but also K for glycosylations etc. have a very different baseline RSA distribution, the distributions of modified residues as such are not so informative. Are the distributions of residues with the alpha fold LOD 0.65 different between modified and non-modified?
      • Same point: it is very clear that "tyrosine presenting a larger proportion of cryptic phosphor-sites", as they mainly are within folded domains to begin with. The pattern of phosphorylation and clustering is very different between the modified amino acid residue T,S,Y and needs consideration, given the large number of PTMs, a simple distribution is not sufficient to argue.
      • Figure 3 E (proteins need names in the figure ): the cryptic site T222 (Chk1) is not in the quasi ridged domain, it is in a light color region. What is actually the SPECTRUS cutoff? The Pidc is only one sentence in the main text? It says fewer than 80% intradomain contacts in rigid domains i.e. >0.8, right, but is the domain rigid?
      • The evolutionary comparison (which is not my core expertise), seems again like comparing different things. Why not comparing cryptic and non-cryptic sites in the same protein regions? Also p-Y are, evolutionarily speaking, very different to p-S and p-T. How is this possibly considered in one distribution. p-Y analysis needs to be separated from the p-T and p-S analyses here.
      • Have the authors thought of randomization of their data to see whether the distributions are significant?
      • Labeling in Suppl Figures is insufficient. E.g. In S6 what are the various WT, A and D numbering, are this independent stable transfections/clones? Figure S7 what is R?
      • Whether or not findings are "impressive" should be up to the reader, please remove these attributes in the text.

      Significance

      The identification of buried modification sites and what the biological meaning / implications are is a very interesting topic. However PTM distribution on proteins is very skewed (many papers have identified cluster, hot spots, structural dependencies etc...) and therefore comparing modified sites on different residues and in different protein regions and with non-modified residues has to be very stringently controlled.

      main conclusion: 5% of all known phospho-sites are cryptic, at least one in 1/3 of structured protein regions.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews: 

      Reviewer #1 (Public Review): 

      This manuscript presents insights into biased signaling in GPCRs, namely cannabinoid receptors. Biased signaling is of broad interest in general, and cannabinoid signaling is particularly relevant for understanding the impact of new drugs that target this receptor. Mechanistic insight from work like this could enable new approaches to mitigate the public health impact of new psychoactive drugs. Towards that end, this manuscript seeks to understand how new psychoactive substances (NPS, e.g. MDMB-FUBINACA) elicit more signaling through βarrestin than classical cannabinoids (e.g. HU-210). The authors use an interesting combination of simulations and machine learning. 

      We thank the reviewer for the comments. We have provided point by point response to the reviewer’s comment below and incorporated the suggestions in our revised manuscript. Modified parts of manuscripts are highlighted in yellow.   

      Comments:

      (1) The caption for Figure 3 doesn't explain the color scheme, so it's not obvious what the start and end states of the ligand are. 

      We thank the reviewer to point this out. We have added the color scheme in the figure caption. 

      (2) For the metadynamics simulations were multiple Gaussian heights/widths tried to see what, if any, impact that has on the unbinding pathway? That would be useful to help ensure all the relevant pathways were explored.  

      We thank the reviewer for the suggestion. We agree with the reviewer that gaussian height/width may impact unbinding pathway. However, we like to point out that we used a well-tempered version of the metadynamics. In well-tempered metadynamics, the effective gaussian height decreases as bias deposition progresses. Therefore, we believe that the gaussian height/width should have minimal impact on the unbinding pathway. To address the reviewer's suggestion, we conducted additional well-tempered metadynamics simulations varying key parameters such as bias height, bias factor, and the deposition rate, all of which can influence the sampling space. Parameter values for bias height, bias factor and deposition rate that we originally used in the paper are 0.4 kcal/mol, 15 and 1/5 ps<sup>-1</sup>, respectively. We explored different values for these parameters and projected the sampled space on top of previously sampled region (Figure S4). We observed that new simulations sample similar unbinding pathway in the extracellular direction and discover similar space in the binding pocket as well. 

      Results and Discussion (Page 10)

      “We also performed unbinding simulations using well-tempered metadynamics parameters (bias height, bias deposition rate and bias factor) to confirm the existence of alternative pathways (Figure S4). However, the simulations show that ligands follow the similar pathway for all

      metadynamics runs.”

      (3) It would be nice to acknowledge previous applications of metadynamics+MSMs and (separately) TRAM, such as the Simulation of spontaneous G protein activation... (Sun et al. eLife 2018) and Estimation of binding rates and affinities... (Ge and Voelz JCP 2022). 

      We appreciate the reviewer's feedback. We have incorporated additional citations of studies demonstrating the use of TRAM as an estimator for both kinetics and thermodynamics (e.g. Ligand binding: Ge, Y. and Voelz, V.A., JCP, 2022[1]; Peptide-protein binding kinetics: Paul, F. et al., Nat. Commun., 2017[2], Ge, Y. et al., JCIM, 2021[3]). Additionally, we have included references to studies where biased simulations were initially used to explore the conformational space, and the results were then employed to seed unbiased simulations for building a Markov state model. (Metadynamics: Sun, X. et al., elife, 2018[4]; Umbrella Sampling: Abella, J. R. et al., PNAS, 2020[5]; Replica Exchange: Paul, F. et al., Nat. Commun., 2017[2]).

      (4) What is KL divergence analysis between macrostates? I know KL divergence compares probability distributions, but it is not clear what distributions are being compared. 

      We apologize for this confusion. The KL divergence analysis was performed on the probability distributions of the inverse distances between residue pairs from any two macrostates. Each macrostate was represented by 1000 frames that were selected proportional to the TRAM stationary density. All possible pair-wise inverse distances were calculated per frame for the purpose of these calculations. Although KL divergence is inherently asymmetric, we symmetrized the measurement by calculating the average. Per-residue K-L divergence, which is shown in the main figures as color and thickness gradient, was calculated by taking the sum of all pairs corresponding to the residue. We have included a detailed discussion of K-L divergence in Methods section.  We have also modified the result section to add a brief discussion of K-L divergence methodology.

      Results and Discussion (Page 15)

      “We further performed Kullback-Leibler divergence (K-L divergence) analysis between inverse distance of residue pairs of two macrostates to highlight the protein region that undergoes high conformational change with ligand movement.”

      Methods (Page 33)

      “Kullback–Leibler divergence (K-L divergence) analysis was performed to show the structural differences in protein conformations in different macrostates[4,114] . In this study, this technique was used to calculate the difference in the pairwise inverse distance distributions between macrostates. Each macrostate was represented by 1000 frames that were selected proportional to their TRAM weighted probabilities. Although K-L divergence is an asymmetric measurement, for this study, we used a symmetric version of the K-L divergence by taking the average between two macrostates. Per residue contribution of K-L divergence was calculated by taking the sum of all the pairwise distances corresponding to that residue. This analysis was performed by inhouse Python code.”  

      (5) I suggest being more careful with the language of universality. It can be "supported" but "showing" or "proving" its universal would require looking at all possible chemicals in the class. 

      We thank the reviewer for the suggestion. In response, we have revised the manuscript to ensure that the language reflects that our findings are based on observations from a limited set of ligands, namely one NPS and one classical cannabinoid. We have replaced references to ligand groups (such as NPS or classical cannabinoid) with the specific ligand names (such as MDMB-FUBINACA or HU-210) to avoid claims of universality and prevent any potential confusion.

      Results and Discussion (Page 19)

      “In this work, we trained the network with the NPS (MDMB-FUBINACA), and classical cannabinoid (HU-210) bound unbiased trajectories (Method Section). Here, we compared the allosteric interaction weights between the binding pocket and the NPxxY motif which involves in triad interaction formation. Results show that each binding pocket residue in MDMBFUBINACA bound ensemble shows higher allosteric weights with the NPxxY motif, indicating larger dynamic interactions between the NPxxY motif and binding pocket residues(Figure S9).  The probability of triad formation was estimated to observe the effect of the difference in allosteric control. TRAM weighted probability calculation showed that MDMB-FUBINACA bound CB1 has the higher probability of triad formation (Figure 8A). Comparison of the pairwise interaction of the triad residues shows that interaction between Y397<sup>7.53</sup>-T210<sup>3.46</sup> is relatively more stable in case of MDMB-FUBINACA bound CB1, while other two inter- actions have similar behavior for both systems (Figures S10A, S10B, and S10C). Therefore, higher interaction between Y397<sup>7.53</sup> and T210<sup>3.46</sup> in MDMB-FUBINACA bound receptor causes the triad interaction to be more probable. 

      Furthermore, we also compared TM6 movement for both ligand bound ensemble which is another activation metric involved in both G-protein and β-arrestin binding. Comparison of TM6 distance from the DRY motif of TM3 shows similar distribution for HU-210 and MDMBFUBINACA (Figure 8B). These observations support that NPS binding causes higher β-arrestin signaling by allosterically controlling triad interaction formation.” 

      Reviewer #2 (Public Review): 

      Summary: 

      The investigation provides computational as well as biochemical insights into the (un)binding mechanisms of a pair of psychoactive substances into cannabinoid receptors. A combination of molecular dynamics simulation and a set of state-of-the art statistical post-processing techniques were employed to exploit GPCR-ligand dynamics. 

      Strengths: 

      The strength of the manuscript lies in the usage and comparison of TRAM as well as Markov state modelling (MSM) for investigating ligand binding kinetics and thermodynamics. Usually, MSMs have been more commonly used for this purpose. But as the authors have pointed out, implicit in the usage of MSMs lies the assumption of detailed balance, which would not hold true for many cases especially those with skewed binding affinities. In this regard, the author's usage of TRAM which harnesses both biased and unbiased simulations for extracting the same, provides a more appropriate way out. 

      Weaknesses: 

      (1) While the authors have used TRAM (by citing MSM to be inadequate in these cases), the thermodynamic comparisons of both techniques provide similar values. In this case, one would wonder what advantage TRAM would hold in this particular case. 

      We thank the reviewer for the comment. While we agree that the thermodynamic comparisons between MSM and TRAM provide similar values in this instance, we would like to emphasize the underlying reasoning behind our choice of TRAM.

      MSM can struggle to accurately estimate thermodynamic and kinetic properties in cases where local state reversibility (detailed balance) is not easily achieved with unbiased sampling. This is especially relevant in ligand unbinding processes, which often involve overcoming high free energy barriers. TRAM, by incorporating biased simulation data (such as umbrella sampling) in addition to unbiased data, can better achieve local reversibility and provide more robust estimates when unbiased sampling is insufficient.

      The similarity in thermodynamic estimates between MSM and TRAM in our study can be attributed to the relatively long unbiased sampling period (> 100 µs) employed. With sufficient sampling, MSM can approach detailed balance, leading to results comparable to those from TRAM. However, as we demonstrated in our manuscript (Figure 4D), when the amount of unbiased sampling is reduced, the uncertainties in both the thermodynamics and kinetics estimates increase significantly for MSM compared to TRAM. Thus, while MSM and TRAM perform similarly under the conditions of extensive sampling, TRAM's advantage lies in its robustness when unbiased sampling is limited or difficult to achieve. 

      (2) The initiation of unbiased simulations from previously run biased metadynamics simulations would almost surely introduce hysteresis in the analysis. The authors need to address these issues. 

      We thank the reviewer for the comment. We acknowledge that biased simulations could potentially introduce hysteresis or result in the identification of unphysical pathways. However, we believe this issue is mitigated using well-tempered metadynamics, which gradually deposit a decaying bias. This approach enables the simulation to explore orthogonal directions of collective variable (CV) space, reducing the likelihood of hysteresis effects(Invernizzi, M. and Parrinello, M., JCTC, 2019[6]).

      Furthermore, there is precedent for using metadynamics-derived pathways to initiate unbiased simulations for constructing Markov State Models (MSMs). This methodology has been successfully applied in studying G-protein activation (Sun, X. et al., elife, 2018[4]).

      Additional support to our observation can be found in two independent binding/unbinding studies of ligands from cannabinoid receptors, which have discovered similar pathway using different CVs (Saleh, et al., Angew. Chem., 2018[7]; Hua, T. et al., Cell, 2020[8]).   

      (3) The choice of ligands in the current work seems very forced and none of the results compare directly with any experimental data. An ideal case would have been to use the seminal D.E. Shaw research paper on GPCR/ligand binding as a benchmark and then show how TRAM, using much lesser biased simulation times, would fare against the experimental kinetics or even unbiased simulated kinetics of the previous report 

      We would like to address the reviewer's concerns regarding the choice of ligands, lack of direct experimental comparison, and the use of TRAM, and clarify our rationale point by point:

      Ligand Choice: The ligands selected for this study were chosen due to their relevance and well characterized binding properties. MDMB-FUBINACA is well-known NPS ligand with documented binding properties. This ligand is still the only NPS ligand with experimentally determined CB1 bound structure (Krishna Kumar, K. et al., Cell, 2019[9]). Similarly, the classical cannabinoid (HU-210) used in this study has established binding characteristics and is one of earliest known synthetic classical cannabinoid. Therefore, these ligands serve as representative compounds within their respective categories, making them suitable for our comparative analysis.

      Experimental Comparison: We have indeed compared our simulation results to experimental data, particularly focusing on binding free energies. In the result section, we have shown that the relative binding free energy estimated from our simulation aligns closely with the experimentally measured values. Additionally, Absolute binding energy estimates are also within ~3 kcal/mol of the experimentally predicted value.

      TRAM Performance: TRAM estimated free energies, and rates have been benchmarked against experimental predictions for various studies along with our study (Peptide-protein binding: Paul, F. et al., Nat. Commun., 2017[2]; Ligand unbinding: Wu, H. et al., PNAS, 2016[10]) . As the primary goal of this study is to compare ligand unbinding mechanism, we believe benchmarking against other datasets, such as the D.E. Shaw GPCR/ligand binding paper, is not essential for this work.

      (4) The method section of the manuscript seems to suggest all the simulations were started from a docked structure. This casts doubt on the reliability of the kinetics derived from these simulations that were spawned from docked structure, instead of any crystallographic pose. Ideally, the authors should have been more careful in choosing the ligands in this work based on the availability of the crystallographic structures. 

      We thank the reviewer for the comment. We would like to clarify that we indeed used an experimentally derived pose for one of the ligands (MDMB-FUBINACA) as the cryo-EM structure of MDMB-FUBINACA bound to the protein was available (PDB ID: 6N4B) (Krishna Kumar K. et al., Cell, 2019[9]). However, as the cryo-EM structure had missing loops, we modeled these regions using Rosetta. We apologize for this confusion and have modified our method section to make this point clearer. 

      Regarding HU-210, we acknowledge that a crystallographic or cryo-EM structure for this specific ligand was not available. We selected HU-210 because it is most commonly used example of classical cannabinoid in the literature with extensively studied thermodynamic properties. Importantly, our docking results for HU-210 align closely with previously experimentally determined poses for other classical cannabinoids (Figure S11) and replicate key polar interactions, such as those with S383<sup>7.39</sup>, which are characteristic of this class of compounds. 

      System Preparation (Page 22)

      “Modeling of this membrane proximal region was also performed Remodel protocol of Rosetta loop modeling. A distance constraint is added during this modeling step between C98N−term and C107N−term to create the disulfide bond between the residues. [74,76] 

      As the cryo-EM structure of MDMB-FUBINACA was known, ligand coordinate of MDMB- FUBINACA was added to the modeled PDB structure. The “Ligand Reader & Modeler” module of CHARMM-GUI was used for ligand (e.g., MDMB-Fubinaca) parameterization using CHARMM General Force Field (CGenFF).[77]”

      (5) The last part of using a machine learning-based approach to analyze allosteric interaction seems to be very much forced, as there are numerous distance-based more traditional precedent analyses that do a fair job of identifying an allosteric job. 

      We thank the reviewer for the valuable comment. Neural relational inference method, which leverages a VAE (Variational Autoencoder) architecture, attempts to reconstruct the conformation (X) at time t + τ based on the conformation at time t. In doing so, it captures the non-linear dynamic correlations between residues in the VAE latent space. We chose this method because it is not reliant on specific metrics such as distance or angle, making it potentially more robust in predicting allosteric effects between the binding pocket residues and the NPxxY motif.

      In response to the reviewer's suggestion, we have also performed a more traditional allosteric analysis by calculating the mutual information between the binding pocket residues and the NPxxY motif. Mutual information was computed based on the backbone dihedral angles, as this provides a metric that is independent of the relative distances between residues. Our results indicate that the mutual information between the binding pocket residues and the NPxxY motif is indeed higher for the NPS binding simulation (Figure S11).

      Method

      Mutual information calculation

      Mutual information was calculated on same trajectory data as NRI analysis. Python package MDEntropy was used for estimating mutual information between backbone dihedral angles of two residues. 

      Results and Discussion (Page 21)

      “To further validate our observations, we estimated allosteric weights between the binding pocket and the NPxxY motif by calculating mutual information between residue movements. Mutual information analysis reaffirms that allosteric weights between these residues are indeed higher for the MDMB-FUBINACA bound ensemble (Figure S11).”

      Mutual Information Estimation (Page 37)

      “Mutual information between dynamics of residue pairs was computed based on the backbone dihedral angles, as this provides a metric that is independent of the relative distances between residues. The calculations were done on same trajectory data as NRI analysis. Python package MDEntropy was used for estimating mutual information between backbone dihedral angles of two residues.[124]”

      (6) While getting busy with the methodological details of TRAM vs MSM, the manuscript fails to share with sufficient clarity what the distinctive features of two ligand binding mechanisms are. 

      We thank the reviewer for the insightful comment. In the manuscript, we discussed that the overall ligand (un)binding pathways are indeed similar for both ligands. Therefore, they interact with similar residues during the unbinding process. However, we have focused on two key differences in unbinding mechanism between the two ligands:

      (1) MDMB-FUBINACA exhibits two distinct unbinding mechanisms. In one, the linked portion of the ligand exits the receptor first. In the other mechanism, the ligand rotates within the pocket, allowing the tail portion to exit first. By contrast, for HU-210, we observe only a single unbinding mechanism, where the benzopyran ring leads the ligand out of the receptor. We have highlighted these differences in the Figure 6 and 7 and talked about the intermediate states appear along these different unbinding mechanisms. For further clarification of these differences, we have added arrows in the free energy landscapes to highlight these distinct pathways.

      (2) In the bound state, a significant difference is observed in the interaction profiles. HU-210, a classical cannabinoid, forms strong polar interactions with TM7, while MDMB-FUBINACA shows weaker polar interactions with this region.

      We have discussed these differences in the Results and Discussion section (Page 13-18) & conclusion section (Page 23-24).

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors): 

      (1) The authors should choose at least one case where the ligand's crystallographic pose is known and show how TRAM works in comparison to MSM or experimental report. 

      We thank the reviewer for the comment. We have used the experimentally determined cryo-EM pose for one of the ligands (i.e. MDMB-FUBINACA).  We have modified the manuscript to avoid confusion. (Please refer to the response of comment 4 of reviewer 2)

      (2) The authors should consider existing traditional methods that are used to detect allostery and compare their machine-learning-based approach to show its relevance. 

      We appreciate the reviewer’s comment. We have performed the traditional analysis by calculating mutual information between residue dynamics. We have shown that the traditional analysis matches with Machine learning based NRI calculation. (Please refer to the response of comment 5 of reviewer 2)

      (3) Figure 3 doesn't provide a guide on the pathway of ligand. Without a proper arrow, it is difficult to surmise what is the start and end of the pathway. The figures should be improved. 

      We appreciate the reviewer’s suggestion. In response, we have revised Figure 3 to clearly indicate the ligand’s unbinding pathway by adding directional arrows and labeling the bound pose. Additionally, we have updated the figure caption to better clarify the color scheme used in the illustration. 

      (4) The Figure 5 presentation of free energetics has a very similar shape for the two ligands. More clarity is required on how these two ligands are different. 

      We thank the reviewer for the comment. While the overall shapes of the free energy profiles for the two ligands are indeed similar, this is expected as both ligands dissociate from the same pocket and follow a comparable pathway. However, key differences in their unbinding mechanisms arise due to variations in the ligand motion within the pocket. Specifically, the intermediate metastable minima in the free energy landscapes reflect these differences. For instance, in the NPS unbinding free energy landscape, the intermediate metastable state I1 corresponds to a conformation where the NPS ligand maintains a polar interaction with TM7, while the tail of the ligand has shifted away from TM5. This intermediate state is absent in the classical cannabinoid unbinding pathway, where no equivalent conformation appears in the landscape.  

      (6) Page 30: TICA is wrongly expressed as 'Time-independent component analysis'. It is not a time-independent process. Rather it is 'Time structured independent component analysis'. 

      We thank the reviewer for pointing this out. TICA should be expressed as Time-lagged independent component analysis or Time-structure independent component analysis. We have used the first expression and modified the manuscript accordingly.  

      (7) The manuscript's MSM theory part is quite well-known which can be removed and appropriate papers can be cited. 

      We thank the reviewer for the comment. We have removed the theory discussion of MSM and cited relevant papers.

      “Markov State Model

      Markov state model (MSM) is used to estimate the thermodynamics and kinetics from the unbiased simulation.[56,91] MSM characterizes a dynamic process using the transition probability matrix and estimates its relevant thermodynamics and kinetic properties from the eigendecomposition of this matrix. This matrix is usually calculated using either maximum likelihood or Bayesian approach.[56,97] The prevalence of MSM as a post-processing technique for MD simulations was due to its reliance on only local equilibration of MD trajectories to predict the global equilibrium properties.[92,93] Hence, MSM can combine information from distinct short trajectories, which can only attain the local equilibrium.[94–96]  

      The following steps are taken for the practical implementation of the MSM from the MD data. [4,17,98–100]”

      (8) A proper VAMP score-based analysis should be provided to show confidence in MSM's clustering metric and other hyperparameters. 

      We thank the reviewer for the recommendation. VAMP-2 score based analysis had been discussed in the method section.  We estimated VAMP-2 score of MSM built with different cluster number and input TIC dimensions (Figure S15). Model with best VAMP-2 was selected for comparison with TRAM result.

    1. Reviewer #1 (Public review):

      Summary:

      The paper presents a novel method for RSA, called trial-level RSA (tRSA). The method first constructs a trial x trial representation dissimilarity matrix using correlation distances, assuming that (as in the empirical example) each trial has a unique stimulus. Whereas "classical RSA" correlates the entire upper triangular matrix of the RDM / RSM to a model RDM / RSM, tRSA first calculates the correlation to the model RDM per row, and then averages these values. The paper claims that tRSA has increased sensitivity and greater flexibility than classical RSA.

      Strengths & Weaknesses:

      I have to admit that it took a few hours of intense work to understand this paper and to even figure out where the authors were coming from. The problem setting, nomenclature, and simulation methods presented in this paper do not conform to the notation common in the field, are often contradictory, and are usually hard to understand. Most importantly, the problem that the paper is trying to solve seems to me to be quite specific to the particular memory study in question, and is very different from the normal setting of model-comparative RSA that I (and I think other readers) may be more familiar with.

      Main issues:

      (1) The definition of "classical RSA" that the authors are using is very narrow. The group around Niko Kriegeskorte has developed RSA over the last 10 years, addressing many of the perceived limitations of the technique. For example, cross-validated distance measures (Walther et al. 2016; Nili et al. 2014; Diedrichsen et al. 2021) effectively deal with an uneven number of trials per condition and unequal amounts of measurement noise across trials. Different RDM comparators (Diedrichsen et al. 2021) and statistical methods for generalization across stimuli (Schütt et al. 2023) have been developed, addressing shortcomings in sensitivity. Finally, both a Bayesian variant of RSA (Pattern component modelling, (Diedrichsen, Yokoi, and Arbuckle 2018) and an encoding model (Naselaris et al. 2011) can effectively deal with continuous variables or features across time points or trials in a framework that is very related to RSA (Diedrichsen and Kriegeskorte 2017). The author may not consider these newer developments to be classical, but they are in common use and certainly provide the solution to the problems raised in this paper in the setting of model-comparative RSA in which there is more than one repetition per stimulus.

      (2) The stated problem of the paper is to estimate "representational strength" in different regions or conditions. With this, the authors define the correlation of the brain RDM with a model RDM. This metric conflates a number of factors, namely the variances of the stimulus-specific patterns, the variance of the noise, the true differences between different dissimilarities, and the match between the assumed model and the data-generating model. It took me a long time to figure out that the authors are trying to solve a quite different problem in a quite different setting from the model-comparative approach to RSA that I would consider "classical" (Diedrichsen et al. 2021; Diedrichsen and Kriegeskorte 2017). In this approach, one is trying to test whether local activity patterns are better explained by representation model A or model B, and to estimate the degree to which the representation can be fully explained. In this framework, it is common practice to measure each stimulus at least 2 times, to be able to estimate the variance of noise patterns and the variance of signal patterns directly. Using this setting, I would define 'representational strength" very differently from the authors. Assume (using LaTeX notation) that the activity patterns $y_j,n$ for stimulus j, measurement n, are composed of a true stimulus-related pattern ($u_j$) and a trial-specific noise pattern ($e_j,n$). As a measure of the strength of representation (or pattern), I would use an unbiased estimate of the variance of the true stimulus-specific patterns across voxels and stimuli ($\sigma^2_{u}$). This estimator can be obtained by correlating patterns of the same stimuli across repeated measures, or equivalently, by averaging the cross-validated Euclidean distances (or with spatial prewhitening, Mahalanobis distances) across all stimulus pairs. In contrast, the current paper addresses a specific problem in a quite specific experimental design in which there is only one repetition per stimulus. This means that the authors have no direct way of distinguishing true stimulus patterns from noise processes. The trick that the authors apply here is to assume that the brain data comes from the assumed model RDM (a somewhat sketchy assumption IMO) and that everything that reduces this correlation must be measurement noise. I can now see why tRSA does make some sense for this particular question in this memory study. However, in the more common model-comparative RSA setting, having only one repetition per stimulus in the experiment would be quite a fatal design flaw. Thus, the paper would do better if the authors could spell the specific problem addressed by their method right in the beginning, rather than trying to set up tRSA as a general alternative to "classical RSA".

      (3) The notation in the paper is often conflicting and should be clarified. The actual true and measured activity patterns should receive a unique notation that is distinct from the variances of these patterns across voxels. I assume that $\sigma_ijk$ is the noise variances (not standard deviation)? Normally, variances are denoted with $\sigma^2$. Also, if these are variances, they cannot come from a normal distribution as indicated on page 10. Finally, multi-level models are usually defined at the level of means (i.e., patterns) rather than at the level of variances (as they seem to be done here).

      (4) In the first set of simulations, the authors sampled both model and brain RSM by drawing each cell (similarity) of the matrix from an independent bivariate normal distribution. As the authors note themselves, this way of producing RSMs violates the constraint that correlation matrices need to be positive semi-definite. Likely more seriously, it also ignores the fact that the different elements of the upper triangular part of a correlation matrix are not independent from each other (Diedrichsen et al. 2021). Therefore, it is not clear that this simulation is close enough to reality to provide any valuable insight and should be removed from the paper, along with the extensive discussion about why this simulation setting is plainly wrong (page 21). This would shorten and clarify the paper.

      (5) If I understand the second simulation setting correctly, the true pattern for each stimulus was generated as an NxP matrix of i.i.d. standard normal variables. Thus, there is no condition-specific pattern at all, only condition-specific noise/signal variances. It is not clear how the tRSA would be biased if there were a condition-specific pattern (which, in reality, there usually is). Because of the i.i.d. assumption of the true signal, the correlations between all stimulus pairs within conditions are close to zero (and only differ from it by the fact that you are using a finite number of voxels). If you added a condition-specific pattern, the across-condition RSA would lead to much higher "representational strength" estimates than a within-condition RSA, with obvious problems and biases.

      (6) The trial-level brain RDM to model Spearman correlations was analyzed using a mixed effects model. However, given the symmetry of the RDM, the correlations coming from different rows of the matrix are not independent, which is an assumption of the mixed effect model. This does not seem to induce an increase in Type I errors in the conditions studied, but there is no clear justification for this procedure, which needs to be justified.

      (7) For the empirical data, it is not clear to me to what degree the "representational strength" of cRSA and tRSA is actually comparable. In cRSA, the Spearman correlation assesses whether the distances in the data RSM are ranked in the same order as in the model. For tRSA, the comparison is made for every row of the RSM, which introduces a larger degree of flexibility (possibly explaining the higher correlations in the first simulation). Thus, could the gains presented in Figure 7D not simply arise from the fact that you are testing different questions? A clearer theoretical analysis of the difference between the average row-wise Spearman correlation and the matrix-wise Spearman correlation is urgently needed. The behavior will likely vary with the structure of the true model RDM/RSM.

      (8) For the real data, there are a number of additional sources of bias that need to be considered for the analysis. What if there are not only condition-specific differences in noise variance, but also a condition-specific pattern? Given that the stimuli were measured in 3 different imaging runs, you cannot assume that all measurement noise is i.i.d. - stimuli from the same run will likely have a higher correlation with each other.

      (9) The discussion should be rewritten in light of the fact that the setting considered here is very different from the model-comparative RSA in which one usually has multiple measurements per stimulus per subject. In this setting, existing approaches such as RSA or PCM do indeed allow for the full modelling of differences in the "representational strength" - i.e., pattern variance across subjects, conditions, and stimuli. Cross-validated distances provide a powerful tool to control for differences in measurement noise variances and possible covariances in measurement noise across trials, which has many distinct advantages and is conceptually very different from the approach taken here. One of the main limitations of tRSA is the assumption that the model RDM is actually the true brain RDM, which may not be the case. Thus, in theory, there could be a different model RDM, in which representational strength measures would be very different. These differences should be explained more fully, hopefully leading to a more accessible paper.

      References:

      Diedrichsen, J., Berlot, E., Mur, M., Schütt, H. H., Shahbazi, M., & Kriegeskorte, N. (2021). Comparing representational geometries using whitened unbiased-distance-matrix similarity. Neurons, Behavior, Data and Theory, 5(3). https://arxiv.org/abs/2007.02789

      Diedrichsen, J., & Kriegeskorte, N. (2017). Representational models: A common framework for understanding encoding, pattern-component, and representational-similarity analysis. PLoS Computational Biology, 13(4), e1005508.

      Diedrichsen, J., Yokoi, A., & Arbuckle, S. A. (2018). Pattern component modeling: A flexible approach for understanding the representational structure of brain activity patterns. NeuroImage, 180, 119-133.

      Naselaris, T., Kay, K. N., Nishimoto, S., & Gallant, J. L. (2011). Encoding and decoding in fMRI. NeuroImage, 56(2), 400-410.

      Nili, H., Wingfield, C., Walther, A., Su, L., Marslen-Wilson, W., & Kriegeskorte, N. (2014). A toolbox for representational similarity analysis. PLoS Computational Biology, 10(4), e1003553.

      Schütt, H. H., Kipnis, A. D., Diedrichsen, J., & Kriegeskorte, N. (2023). Statistical inference on representational geometries. ELife, 12. https://doi.org/10.7554/eLife.82566

      Walther, A., Nili, H., Ejaz, N., Alink, A., Kriegeskorte, N., & Diedrichsen, J. (2016). Reliability of dissimilarity measures for multi-voxel pattern analysis. NeuroImage, 137, 188-200.

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank the reviewers for their positive and constructive comments on the manuscript. In the revised manuscript we addressed these comments, which we believe have improved the quality of our work.

      In summary:

      (1) We acknowledge the reviewer's suggestion to incorporate open-source segmentation and tracking functionalities, increasing its accessibility to a wider user base; however, these additions fall outside the primary scope of our current work, which is to provide an analytical framework for IVM data after segmentation and tracking. Developing open-source segmentation and tracking tools represents a substantial undertaking in its own right, which has been comprehensively explored in other studies (e.g. https://doi.org/10.4049/jimmunol.2100811; https://doi.org/10.7554/eLife.60547; https://doi.org/10.1016/j.media.2022.102358; https://doi.org/10.1038/s41592024-02295-6 - now cited in our revised manuscript). 

      In our analyses, we used data processed with Imaris, a commercial software that, despite its limitations, is widely used by the intravital microscopy community due to its user-friendly platform for 3D image visualization and analysis. Nevertheless, recognizing the need for compatibility with tracking data from various pipelines, we have modified our tool to accept other data formats, such as those generated by open-source Fiji plugins like TrackMate, MTrackJ, ManualTracking (https://github.com/imAIgene-Dream3D/BEHAV3D_Tumor_Profiler?tab=readme-ov-file#data-input). These updates are available in our GitHub repository and are described in the revised manuscript. 

      (2) We appreciate the reviewer #3 suggestion to incorporate additional features into our analytical pipeline. In response, we have already updated the GitHub repository to allow users to input and select which features (dynamic, morphological, or spatial) they wish to include in the analysis (https://github.com/imAIgene-Dream3D/BEHAV3D_Tumor_Profiler?tab=readmeov-file#feature-selection ). In the revised manuscript, we highlighted this new functionality and provided examples using alternative datasets to demonstrate the application of these features.

      (3)  We appreciate the constructive feedback of reviewers #1 and #2 regarding the statistical analysis and interpretation of the data presented in Figures 3 and 4. We understand the importance of clarity and rigor in data analysis and presentation, and we addressed the concerns raised in the revised version of the manuscript.

      (4) We appreciate reviewer #1's suggestion regarding the inclusion of demo data, as we believe it would greatly enhance the usability of our pipeline. We acknowledge that this was an oversight on our part. To address this, we have now added demos to our GitHub repository (https://github.com/imAIgene-

      Dream3D/BEHAV3D_Tumor_Profiler/tree/BEHAV3D_TP-v2.0/demo_datasets). In the revised manuscript, we referenced this addition and present new figures with examples of these demo’s processing different IVM dataset (2D/3D, different tumors and healthy tissues). Additionally, we have provided processed DMG IVM movie samples in an imaging repository.

      (5) Finally, we made some small changes to the manuscript based on the reviewers’ feedback.

      Below we provide a point-by-point response to the reviewers’ comments

      Reviewer #1 (Public review):

      Comment #1: A key limitation of the pipeline is that it does not overcome the main challenges and bottlenecks associated with processing and extracting quantitative cellular data from timelapse and longitudinal intravital images. This includes correcting breathing-induced movement artifacts, automated registration of longitudinal images taken over days/weeks, and accurate, automated segmentation and tracking of individual cells over time. Indeed, there are currently no standardised computational methods available for IVM data processing and analysis, with most laboratories relying on custom-built solutions or manual methods. This isn't made explicit in the manuscript early on (described below), and the researchers rely on expensive software packages such as IMARIS for image processing and data extraction to feed the required parameters into their pipeline. This limitation unfortunately reduces the likely impact of BEHAV3D-TP on the IVM field. 

      As highlighted above, the tool does not facilitate the extraction of quantitative kinetic cellular parameters (e.g. speed, directionality, persistence, and displacement) from intravital images. Indeed, to use the tool researchers must first extract dynamic cellular parameters from their IVM datasets, requiring access to expensive software (e.g. IMARIS as used here) and/or above-average computational expertise to develop and use custom-made open-source solutions. This limitation is not made explicit or discussed in the text.

      We acknowledge the reviewer's suggestion to incorporate open-source segmentation and tracking functionalities, increasing its accessibility to a wider user base; however, these additions fall outside the primary scope of our current work and represent a substantial undertaking in their own right. Several studies (e.g., Diego Ulisse Pizzagalli et al., J Immunol (2022); Aby Joseph et al., eLife (2020); Molina-Moreno et al., Medical Image Analysis (2022); Hidalgo-Cenalmor et al., Nat Methods (2024); Ershov et al., Nat Methods (2022)) have comprehensively addressed these topics, and we now reference them in the revised manuscript to provide readers with relevant background.

      The objective of our manuscript is not to develop a complete segmentation or tracking pipeline but rather to introduce an analytical framework capable of extracting enhanced insights from the data generated by existing tools. This goal arises from our observations of the field: despite significant investment in image processing, researchers often rely on simplistic approaches, such as averaging single parameters across conditions, which can obscure tumor heterogeneity and spatial behavioral dynamics within the tumor microenvironment.

      Our current tool focuses on providing this much-needed analytical capability. For our analysis we used Imaris, a widely utilized software in the intravital microscopy (IVM) community, known for its intuitive 3D visualization and analysis platform despite certain limitations. 

      In our own literature search of recent IVM studies published by leading laboratories in high-impact journals, we found that close to half used Imaris, while the remainder primarily relied on manual workflows with Fiji plugins. Thus, we consider it valuable to offer a pipeline compatible with such commonly used software, given its prevalence in the field.

      However, following the suggestion of the reviewer, and to enhance the tool’s flexibility and compatibility, we have expanded the pipeline to accept data formats generated by open-source Fiji plugins, such as TrackMate, MTrackJ, and ManualTracking. These updates are detailed in the revised manuscript and are implemented in our GitHub repository (https://github.com/imAIgene-Dream3D/BEHAV3D_Tumor_Profiler?tab=readme-ov-file#data-input ), where we also provide several demos using TrackMate and Imaris processed data. This addition demonstrates our tool's capability to integrate with segmented and tracked datasets from diverse platforms, increasing its applicability to a broader range of researchers using both commercial and open-source pipelines.

      Comment #2: The number of cells (e.g. per behavioural cluster), and the number of independent mice, represented in each result figure, is not included in the figure legends and are difficult to ascertain from the methods.

      We appreciate the reviewer's constructive feedback regarding the clarity of the number and type of replicates used in our analyses. In the revised manuscript, we have included detailed information in the figure legends and the number of independent mice represented in each figure legend to ensure transparency. Regarding the number

      of cells, we have indicated the total number of processed cells in Figure 2b legend (953 cells). Additionally, we have now included figures (Sup Fig 4c, Sup Fig 5e-g, Fig 5c,e, Sup Fig 6 c,d) for each cluster, where individual dots represent the individual cell tracks with color indicating the position and the shape indicating individual mice.

      Comment #3: The data used to test the pipeline in this manuscript is currently not available, making it difficult to assess its usability. It would be important to include this for researchers to use as a 'training dataset'.

      As stated above we acknowledge that this was an oversight on our part and thank the reviewer for pointing this out. To address this, we have now added demo data to our GitHub repository (BEHAV3D_Tumor_Profiler/demo_datasets at main · imAIgeneDream3D/BEHAV3D_Tumor_Profiler · GitHub). In the revised manuscript we have referenced this addition in the Data availability section. Since we included now processing with Fiji as well, we provide 4 demo datasets (https://github.com/imAIgeneDream3D/BEHAV3D_Tumor_Profiler/tree/main/demo_datasets), one processed with Imaris in 3D; and one with CellPose2.0 and Trackmate in 2D; one processed with µSAM and Trackmate in 3D and one manually processed with MtrackJ in 2D . Moreover, we now provide Imaris-processed DMG IVM movie samples in an open-source repository.

      Comment #4: Precisely how the BEHAV3D-TP large-scale phenotyping module can map large-scale spatial phenotyping data generated using LSR-3D imaging data and Cytomap to 3D intravital imaging movies is unclear. Further details in the text and methods would be beneficial to aid understanding.

      We appreciate the reviewer’s comment and in the revised manuscript we have now provided details in the methods section “Tumor large-scale spatial phenotyping with Cytomap” to clarify how the BEHAV3D-TP module maps LSR-3D and Cytomap data to 3D intravital imaging movies:

      “To map the assigned regions onto IVM movies, a 3D image of the cluster distribution within the tumor was generated and exported for each sample (Figure Supplement 5a). Next, regions within the IVM movies were visually matched to the corresponding regions identified by the Large-Scale Phenotyping module of Cytomap (Figure 3c). For each mouse, at least one or two representative positions per matched region type were selected, cropped, and analyzed to assess tumor cell behavior, following the previously described cell tracking methodology (Imaris Cell tracking).”

      Moreover, we updated Figure 3 c to further clarify these steps.

      Comment #5: The analysis provides only preliminary evidence in support of the authors' conclusions on DMG cell migratory behaviours and their relationship with components of the tumour microenvironment. Conclusions should therefore be tempered in the absence of additional experiments and controls. 

      We appreciate the reviewer’s comment and acknowledge that our conclusions should be tempered due to the preliminary nature of our evidence. In the revised version of the manuscript we have revised our conclusions accordingly and emphasize the necessity for additional experiments and controls to further validate our findings on DMG cell migratory behaviors and their relationship with the tumor microenvironment.

      In discussion: “While our findings suggest that microenvironmental factors may influence tumor cell migration, further studies will be necessary to establish causal relationships. Additional experimental validation, such as macrophage ablation experiments, could help clarify the specific contributions of these factors.”

      Reviewer #1 (Recommendations for the authors): 

      (1) To test the ability of the pipeline to identify relevant patterns of migratory behaviours additional 'control' experiments would be helpful e.g. comparing non-invasive vs invasive tumour cell lines, artificially controlling migratory behaviours of cells such as implanting beads soaked in factors that would attract/repel cells? 

      (2) Does the pipeline work well for a variety of cell types/contexts? e.g. can it identify and cluster more subtle migratory behaviours such as non-tumour cells during tissue development or regeneration conditions? 

      We appreciate the reviewer’s valuable suggestions. In the revised manuscript, we have included additional examples demonstrating the capability of our pipeline to investigate heterogeneous cell behavior across two additional experimental setups:

      (1) We have now evaluated our BEHAV3D TP heterogeneity module using IVM data from breast cancer cell lines with varying migratory capacities (DOI: 10.1016/j.yexcr.2019.04.009). In these datasets, our pipeline extends beyond predefined characteristics based solely on speed, enabling the identification of distinct cell populations. Notably, our analysis reveals that the breast cancer lines exhibit different proportions of different migratory behaviors such as Fast, Intermediate, Very slow and Static (Supplementary Fig 1).

      (2) We have now evaluated our BEHAV3D TP heterogeneity module using IVM data from healthy breast epithelial cells (DOI: 10.1016/j.celrep.2024.115073), where we identify distinct morhophynamic epithelial cell populations in the terminal end but of the mammary gland that have a distinct distribution among Hormone receptor (HR) + and HR- terminal end but cells.

      (3) To support biological conclusions could the authors show that ablating tumourassociated macrophages or vasculature alters the migratory patterns of nearby tumour cells? 

      We appreciate the reviewer's suggestion regarding the potential effects of ablating tumor-associated macrophages or vasculature on the migratory patterns of nearby tumor cells. While these experiments would functionally validate the observations made by our method, we would like to clarify that the primary focus of our study was on the development and application of computational tools for behavioral analysis and thus we consider that delving deeper in understanding the biology behind our observation is out of the scope of the current study. However, as mentioned previously, we have carefully tempered our conclusions to acknowledge the limitations of our current study. In the revised manuscript, we explicitly highlight that experiments involving the ablation of tumor-associated macrophages or vasculature would be crucial for further understanding the biological relevance of our findings.

      Minor corrections to text: 

      (4) Line 63 - are references formatted correctly?

      Thank you for pointing out this error. We have corrected it in the revised manuscript.

      (5) Lines 161 -162 - 'intravitally imaged' used twice in a sentence.

      Thank you for pointing out the typo. We have corrected it in the revised manuscript.

      Reviewer #2 (Public review):

      Comment#1: The strength of democratizing this kind of analysis is undercut by the reliance upon Imaris for segmentation, so it would be nice if this was changed to an open-source option for track generation.

      As noted in our previous response to Reviewer #1, we would like to point out that although Imaris is a commercial software, it is widely used in the intravital microscopy community due to its user-friendly interface. We conducted a literature review to evaluate this aspect and below we include references from leading laboratories in the IVM field that utilize Imaris. One of its key advantages, which we also utilized, is semi-automated data tracking that allows for manual corrections in 3D—a process that can be more challenging in other open-source software with less effective data visualization.

      However, we recognize that enhancing our pipeline's compatibility with open-source options is important. To this end, we have updated our tool to support 2D and 3D data formats generated by open-source Fiji plugins like TrackMate, MTrackJ, and ManualTracking, improving compatibility with various segmentation and tracking pipelines (https://github.com/imAIgene-Dream3D/BEHAV3D_Tumor_Profiler?tab=readme-ov-file#data-input ). In the revised manuscript, we describe the new functionality and demonstrate the operation of the BEHAV3D-TP heterogeneity module across various IVM datasets, processed in both 2D and 3D with different processing pipelines (Supplementary Fig 1-3). This includes CellPose 2.0 and the novel 'Segment Anything' model, followed by TrackMate tracking, applied to both tumor and healthy IVM data. Moreover we have developed a new web application that integrates morphological and tracking information from Segment Anything segmentation and Trackmate tracking, depicted in Supplementary Fig 3 a (https://morphotrack-merger.streamlit.app/ ). Additionally, we have updated the introduction to better clarify the scope of our study and include references to existing image processing solutions.

      Comment#2: The main issue is with the interpretation of the biological data in Figure 3 where ANOVA was used to analyse the proportional distribution of different clusters. Firstly the n is not listed so it is unclear if this represents an n of 3 where each mouse is an individual or whether each track is being treated as a test unit. If the latter this is seriously flawed as these tracks can't be treated as independent. Also, a more appropriate test would be something like a Chi-squared test or Fisher's exact test. Also, no error bars are included on the stacked bar graphs making interpretation impossible. Ultimately this is severely flawed and also appears to show very small differences which may be statistically different but may not represent biologically important findings. This would need further study.

      We appreciate the reviewer’s insightful comments regarding the interpretation of the biological data in Figure 3. 

      To clarify, each imaged position is considered an independent biological replicate (n = 18 from a total of 6 mice). We acknowledge that the description of the statistical methods and the experimental units was not sufficiently clear in the previous version. In our original submission, we used an ANOVA to test whether the proportion of each behavioral cluster differed across the tumor microenvironment regions. Post hoc pairwise comparisons were performed using Tukey’s test, with the results shown in Supplementary Figure 2d (currently Fig 3d). However, we agree with the reviewer that this approach may be misleading when paired with stacked bar plots that lack error bars, as it can obscure individual variability and does not explicitly represent statistical uncertainty.

      In the revised manuscript, we present the data as boxplots with individual data points, where each dot represents an imaged position, and the shape corresponds to a specific mouse. In Figure 3 d the y-axis displays the normalized percentage of each cluster across TME regions, expressed as z-scores. This normalization corrects for inter-mouse variability and facilitates a comparison of the relative distribution of clusters across TME regions, independent of the overall abundance differences between mice. We performed an ANOVA with Tukey's post hoc test for each individual behavioral cluster to assess differences across TME regions. Additionally, for transparency, in Supplementary Figure 5 d we provide the raw percentage values. The legends provide the number of positions and mice included in the analysis. 

      Comment#3:  Figure 4 has similar statistical issues in that the n is not listed and, again, it is unclear whether they are treating each cell track as independent which, again, would be inappropriate. The best practice for this type of data would be the use of super plots as outlined in Lord et al. (2020) JCI - SuperPlots: Communicating reproducibility and variability in cell biology.

      We appreciate the reviewer’s comments and suggestions regarding Figure 4. In this case as we are comparing overall the behavioral clusters features, each individual cell is treated as a unit. In the revised manuscript, we have clarified this point in the figure legend and incorporated plots in Figure 4c and 4e, indicating the mouse and imaging position each data point originates from. This enhances the visualization of reproducibility and variability in our data, demonstrating that the results are consistent across multiple mice and positions and are not driven by a single mouse or imaging position.

      Comment#4: The main issue that this raises is that the large-scale phenotyping module and the heterogeneity module appear designed to produce these statistical analyses that are used in these figures and, if they are based on the assumption that each track is independent, then this will produce inappropriate analyses as a default.

      We appreciate the reviewer’s comment, although we are unclear about the specific concern being raised. To clarify, in our large-scale phenotyping analysis, each position is assigned to a TME niche based on the CytoMAP analysis and the workflow outlined in Figure 3c. Multiple positions are imaged per mouse. For each position, we measure the proportion of tumor cells exhibiting a specific behavioral phenotype, and these proportions are subsequently used for statistical analysis (Figure 3 d). 

      In contrast, in Supplementary Fig. 5e-g, we treat each cell track as an individual unit, grouping them by their assigned large-scale region. Here, we assess whether differences between regions can be detected using a conventional single-feature analysis—a more traditional approach. However, we find that this method loses important behavioral patterns and distinctions that BEHAV3D-TP captures.

      We hope that this explanation, along with the modifications made to the figures and figure legends, provides greater clarity.  

      Reviewer #3 (Public review):

      Comment #1: The most challenging task of analyzing 3D time-lapse imaging data is to accurately segment and track the individual cells in 3D over a long time duration. BEHAV3D Tumor Profiler did not provide any new advancement in this regard, and instead relies on commercial software, Imaris, for this critical step. Imaris is known to have a very high error rate when used for analyzing 3D time-lapse data. In the Methods section, the authors themselves stated that "Tumor cell tracks were manually corrected to ensure accurate tracking". Based on our own experience of using Imaris, such manual correction is tedious and often required for every time step of the movie. Therefore, Imaris is not a satisfactory tool for analyzing 3D time-lapse data. Moreover, Imaris is expensive and many research labs probably can't afford to buy it. The fact that BEHAV3D Tumor Profiler critically depends on the faulty ImarisTrack module makes it unclear whether the BEHAV3D tool or the results are reliable.

      If the authors want to "democratize the analysis of heterogeneous cancer cell behaviors", they should perform image segmentation and tracking using open-source codes (e.g., Cellpose, Stardisk & 3DCellTracker) and not rely on the expensive and inaccurate ImarisTrack Module for the image analysis step of BEHAV3D.

      We appreciate the reviewer’s comments on the challenges of segmenting and tracking individual cells in 3D time-lapse imaging data. As mentioned previously (please refer to comment #1 to reviewer #1), our primary focus is to develop an analytical tool for comprehensive data analysis rather than developing tools for image processing. However to enhance accessibility, we have updated our tool to support data formats from open-source Fiji plugins, such as TrackMate, which will benefit users without access to commercial software (https://github.com/imAIgeneDream3D/BEHAV3D_Tumor_Profiler?tab=readme-ov-file#data-input ). In Supplementary Figures 1, 2, and 3, we present IVM data from different sources, processed using three distinct methods: MTrackJ (Supplementary Fig. 1), Cellpose + TrackMate (Supplementary Fig. 2), and µSAM + TrackMate (Supplementary Fig. 3). The latter two represent state-of-the-art deep learning approaches.

      On the other hand, while we recognize the limitations of Imaris, it remains widely used in the intravital microscopy community due to its user-friendly interface for 3D visualization and semi-automated segmentation capabilities. Since no perfect tracking method currently exists, we initially utilized Imaris for its ability to allow manual correction of faulty tracks, ensuring the reliability of our results. This approach, not only widely used (see above) but was the best available option when we began our analysis, allowing us to obtain accurate results efficiently.

      In the revised manuscript, we clarify the scope of our study and provide information on both Imaris and alternative processing options to strengthen the reliability of our findings:

      In introduction: “While significant efforts have been made to develop opensource segmentation and tracking tools for live imaging data, including IVM22–27 fewer tools exist for the unbiased analysis of tumor dynamics. One major barrier is that implementing such analytical methods often requires substantial computational expertise, limiting accessibility for many biomedical researchers conducting IVM experiments. To bridge this gap, we present BEHAV3D Tumor Profiler (BEHAV3D-TP)  by providing a robust, user-friendly tool that allows researchers to extract meaningful insights from dynamic cellular behaviors without requiring advanced programming skills.”

      In the Methods, we describe now describe not only Imaris processing pipeline, but also the µSAM segmentation pipelines and reference to CellPose IVM processing, which are combined with TrackMate for tracking. Additionally, to integrate morphological information from µSAM with tracking data from TrackMate, we developed a web tool to merge the outputs from both processing steps: https://morphotrack-merger.streamlit.app/  

      Comment #2: The authors developed a "Heterogeneity module" to extract distinctive tumor migratory phenotypes from the cell tracks quantified by Imaris. The cell tracks of the individual tumor cells are all quite short, indicating relatively low motility of the tumor cells. It's unclear whether such short migratory tracks are sufficient to warrant the PCA analysis to identify the 7 distinctive migratory phenotypes shown in Figure 2d. It's also unclear whether these 7 migratory phenotypes correspond to unique functional phenotypes.  

      For the 7 distinctive motility clusters, the authors should provide a more detailed analysis of the differences between them. It's unclear whether the difference in retreating, slow retreating, erratic, static, slow, slow invading, and invading correspond to functional difference of the tumor cells.

      While some tumor cells exhibit limited motility, indicated by short tracks, others demonstrate significant migratory capabilities (Figure 2 Invading and Retreating cells). This variability in tumor cell behavior is a central focus of our analysis, and our tool is specifically designed to identify and distinguish these differences. Our PCA analysis effectively captures this variability, as illustrated in Figure 2 d-f. It differentiates between cells exhibiting varying degrees of migratory behavior, including both highly and less migratory phenotypes, as well as their directionality relative to the tumor core and the persistence of their movements. Thus, we believe that our approach provides valuable insights into the distinct migratory phenotypes within the tumor microenvironment. 

      While our current manuscript does not provide explicit evidence linking each motility cluster to functional differences among the tumor cells, it is important to note that the state of the field supports the idea that cell dynamics can predict cell states and phenotypes. Research conducted by ourselves (Dekkers, Alieva et al., Nat Biotech, 2023) and others, such as Craiciuc et al. (Nature, 2022) and Freckmann et al. (Nat Comm, 2022) has shown that variations in cell motility patterns are indicative of underlying functional characteristics. For instance, cell morphodynamic features have been shown to reflect differences in cell types, T cell targeting states (Dekkers, Alieva et al., Nat Biotech, 2023), immune cell types (Crainiciuc et al. (Nature, 2022)), tumor metastatic potential, and drug resistance states (Freckmann et al. (Nat Comm, 2022)). In the revised manuscript, we have referenced relevant studies to underscore the biological significance of these behaviors. By doing so, we hope to clarify the potential implications of our findings and strengthen the overall narrative of our research:

      In discussion: “While our current study does not provide direct functional validation of the distinct motility clusters identified, existing literature strongly supports the notion that cell dynamics can serve as a proxy for functional states and phenotypic heterogeneity. Prior work, including studies by our group[19,66]  as well as Crainiciuc et al.[35] and Freckmann et al.[20], has demonstrated that variations in cell motility patterns can reflect underlying functional characteristics. Specifically, cell morpho-dynamic features have been shown to correlate with differences in cell type identity, T-cell engagement, metastatic potential, and drug resistance states. This growing body of evidence suggests that tumor cell behavior, as captured by BEHAV3D-TP, may serve as a predictive tool for deciphering functional tumor heterogeneity. Future studies integrating transcriptomic or proteomic profiling of motility-defined subpopulations could further elucidate the biological significance of these behavioral phenotypes.”

      Comment #3: Using only motility to classify tumor cell behaviours in the tumor microenvironment (TME) is probably not sufficient to capture the tumor cell difference. There are also other non-tumor cell types in the TME. If the authors aim to develop a computational tool that can elucidate tumor cell behaviors in the TME, they should consider other tumor cell features, e.g., morphology, proliferation state, and tumor cell interaction with other cell types, e.g., fibroblasts and distinct immune cells.

      The authors should expand the scale of tumor behavior features to classify the tumor phenotype clusters, e.g., to include tumor morphology, proliferation state, and tumor cell interaction with other TME cell types.

      We believe that using dynamic features alone is sufficient to capture differences in tumor behavior, as demonstrated by our results in Figure 2. However, we appreciate the reviewer’s suggestion to consider additional features, such as cell morphology, to finetune our analyses. To this end, we have adapted our pipeline to be compatible with any dynamic, morphologic or spatial features present in the data. In the revised manuscript we showcase this new addition with the analyses of two new dataset: 2D IVM data from healthy epithelial breast cells (Supplementary Fig 2) and 3D IVM data from adult gliomas (Supplementary Fig 3). These analyses identified cells with specific morphodynamic characteristics, which exhibited distinct kinetic behaviors or spatial distributions.

      However, we would like to point out that not all features may provide informative insights and that a wide range of features can instead introduce biologically irrelevant noise, making interpretation more challenging. For instance, in 3D microscopy, the zaxis resolution is typically lower, which can lead to artifacts like elongation in that direction. Adding morphological features that capture this may skew the analysis. Therefore, we believe that incorporating additional features should be approached with caution. We clarify these considerations in the revised manuscript to better guide users in utilizing our computational tool effectively:

      In discussion: “In addition to motility-based classification, features such as tumor cell morphology, proliferation state, and interactions with the tumor microenvironment can further refine tumor phenotyping. BEHAV3D-TP allows for the selection of diverse feature types, supporting datasets that include both dynamic, morphological and spatial parameters. However, we recognize that expanding the feature set may introduce biologically irrelevant noise, particularly in 3D microscopy data where limited z-axis resolution can lead to morphological artifacts. This highlights the potential need in the future to include unbiased feature selection strategies, such as bootstrapping methods67, to ensure the identification of meaningful and biologically relevant parameters. Careful consideration of these aspects is key to maximizing the interpretability and predictive value of analyses performed with BEHAV3D-TP.”

      Comment #4: The authors have already published two papers on BEHAV3D [Alieva M et al. Nat Protoc. 2024 Jul;19(7): 2052-2084; Dekkers JF, et al. Nat Biotechnol. 2023 Jan;41(1):60-69]. Although the previous two papers used BEHAV3D to analyze T cells, the basic pipeline and computational steps are similar, in particular regarding cell segmentation and tracking. The addition of a "Heterogeneity module" based on PCA analysis does not make a significant advancement in terms of image analysis and quantification.

      We want to emphasize that we have no intention of duplicating our previous publications. In this manuscript, we have consistently cited our foundational papers, where BEHAV3D was first developed for T cell migratory analysis in in vitro settings. In the introduction, we clearly state that our earlier work inspired us to adopt a similar approach for analyzing cell behavior in intravital microscopy (IVM) data, addressing the specific needs and complexities of analyzing tumor cell behaviors in the tumor microenvironment.

      Importantly, our new work provides several key advancements: 1) a pipeline specifically adapted for intravital microscopy (IVM) data; 2) integration of spatial characteristics from both large-scale and small-scale phenotyping; and 3) a zero-code approach designed to empower researchers without coding skills to effectively utilize the tool. We believe that these enhancements represent meaningful progress in the analysis of cell behaviors within the tumor microenvironment which will be valuable for the IVM community. We ensure that these points are clearly articulated in the revised manuscript:

      In introduction: “In line with this concept of characterizing cellular dynamic properties for cell classification, we have previously developed an analytical platform termed BEHAV3D 19,21 allowing to perform behavioral phenotyping of engineered T cells targeting cancer. While BEHAV3D was initially developed to analyze T cell migratory behavior under controlled in vitro conditions, we sought to expand its application to investigate tumor cell behaviors in IVM data, where the complexity of the TME presents distinct analytical challenges. This manuscript builds on our foundational work but represents a significant advancement by adapting the pipeline specifically for IVM datasets.”

      Reviewer #3 (Recommendations for the authors): 

      (1) If the authors want to "democratize the analysis of heterogeneous cancer cell behaviors", they should perform image segmentation and tracking using open-source codes (e.g., Cellpose, Stardisk & 3DCellTracker) and not rely on the expensive and inaccurate ImarisTrack Module for the image analysis step of BEHAV3D. 

      We thank the reviewer for this recommendation and as stated above we recognize that enhancing our pipeline's compatibility with open-source options is important. To this end, we have updated our tool to support data formats generated by open-source Fiji plugins like TrackMate, MTrackJ, and ManualTracking, improving compatibility with various segmentation and tracking pipelines (https://github.com/imAIgeneDream3D/BEHAV3D_Tumor_Profiler?tab=readme-ov-file#data-input ). In the revised manuscript, we detail this new functionality and demonstrate the operation of the BEHAV3D-TP heterogeneity module using an example dataset of glioma tumors.

      Additionally, we have updated the introduction to better clarify the scope of our study (See comment #1 from Review #3) and include references to existing image processing solutions.

      (2) For the 7 distinctive motility clusters, the authors should provide a more detailed analysis of the differences between them. It's unclear whether the difference in retreating, slow retreating, erratic, static, slow, slow invading, and invading correspond to functional difference of the tumor cells. 

      As noted in the comment above, the revised manuscript now incorporates references to relevant literature that support our understanding that behavioral differences among cells are driven by their underlying functional differences (See comment #2 from Reviewer #3). Additionally, we would like to point to Figure 2d and Supplementary Fig 4 c that provide evidence of the functional distinctions between the identified clusters.

      (3) The authors should expand the scale of tumor behavior features to classify the tumor phenotype clusters, e.g., to include tumor morphology, proliferation state, and tumor cell interaction with other TME cell types.

      We thank the reviewer for this valuable suggestion. In the revised manuscript, we have added the flexibility to incorporate a wide range of features, including morphological ones, and enabled users to select the specific features they wish to include in their analysis. To illustrate this functionality, we have included 2 example dataset analyzed using this approach (See comment #3 from Reviewer #3). Additionally, as indicated above we emphasize the importance of careful selection and interpretation of features, as improper choices may lead to biologically irrelevant results. This clarification is intended to ensure that users apply the tool thoughtfully and derive meaningful insights.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      We thank reviewer 1 for the helpful comments. As indicated in the responses below, we have taken all comments and suggestions into consideration in this revised version of the manuscript.

      Weaknesses:

      While this study convincingly describes the phenotype seen upon Drp1 loss, my major concern is that the mechanism underlying these defects in zygotes remains unclear. The authors refer to mitochondrial fragmentation as the mechanism ensuring organelle positioning and partitioning into functional daughters during the first embryonic cleavage. However, could Drp1 have a role beyond mitochondrial fission in zygotes? I raise these concerns because, as opposed to other Drp1 KO models (including those in oocytes) which lead to hyperfused/tubular mitochondria, Drp1 loss in zygotes appears to generate enlarged yet not tubular mitochondria. Lastly, while the authors discard the role of mitochondrial transport in the clustering observed, more refined experiments should be performed to reach that conclusion.

      It would be difficult to answer from this study whether Drp1 plays a role beyond mitochondrial fission in zygotes. However, the reasons why Drp1 KO zygotes differ from the somatic Drp1 KO model can be discussed as follows.

      First, the reviewer mentioned that the loss of Drp1 in oocytes leads to hyperfused/tubular mitochondria, but in fact, unlike in somatic cells, the EM images in Drp1 KO oocytes show enlarged mitochondria rather than tubular structures (Udagawa et al., Curr Biol. 2014, PMID: 25264261, Fig. 2C and Fig. S1B-D), as in the case of zygotes in this study. Mitochondria in oocytes/zygotes have the shape of a small sphere with an irregular cristae located peripherally. These structural features may be the cause of insensitivity or resistance to inner membrane fusion the resultant failure to form tubular mitochondria as seen in somatic cell models. Nonetheless, quantitative analysis of EM images in the revised version confirmed that the mitochondria of Drp1-depleted embryos were not only enlarged but also significantly elongated (Figure 2J-2M). Therefore, in Drp1-depleted embryos, significant structural and functional (e.g., asymmetry between daughters) changes in mitochondria were observed, and these are expected to lead to defects in the embryonic development.

      As for mitochondrial transport, we do not fully understand the intent of this question, but we do not entirely rule out mitochondrial transport. At least clustered mitochondria did not disperse again, but how mitochondria behave through the cytoskeleton within clusters will require further study, as the reviewer pointed out.

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors show no effect of Myo19 Trim-Away, yet it remains unclear whether myo19 is involved in the positioning of mitochondria around the spindle. Judging by their co-localization during that stage, it might be. Therefore, in the absence of myo19, mitochondria might remain evenly distributed throughout mitosis, thus passively resulting in equal partitioning to daughter cells, with no severe developmental defects. Could the authors show a video of the whole process and discuss it?

      We have newly performed live imaging of mitochondria and chromosomes in Myo19 Trim-Away zygotes (n=13). As shown in Figure 1-figure supplement 2 and Figure 1-Video 2, there were no obvious changes in mitochondrial (and chromosomal) dynamics throughout the first cleavage and no significant mitochondrial asymmetry was observed, Therefore, we conclude that depletion of Myo19 does not cause mitochondrial asymmetry during embryonic cleavage. These results are described in the revised manuscript (Line 218-221).

      (2) Mitochondrial aggregation upon Drp1 depletion should be characterized in more detail: for example, % of mitochondria free, % in small clusters (> X diameter), and % in big clusters (>Y diameter).

      In the revised version, mitochondrial aggregation has been quantified by comparing the cluster size and number in control, Drp1 Trim-Away and Drp1 Trim-Away embryos expressing exogenous Drp1 (mCh-Drp1) (Figure 2G, 2H). In control embryos, mitochondria were interspersed in a large number of small clusters, while in Drp1-depleted embryos, mitochondria became highly aggregated into a small number of large clusters that was reversed by expression of mCh-Drp1. These results are described in the revised manuscript (Line 242-245).

      (3) The discrepancies with parthenogenetic embryos derived from Drp1 (-/-) parthenotes should be commented on. Quantification of the dimensions of the clusters would help establish the degree of similarity/difference. Could the authors comment on their hypothesis as to why the clusters are remarkably larger in Drp1 depleted zygotes?

      In the revised version, we have quantified the mitochondrial aggregation in Drp1 KO parthenotes (Figure 2-figure supplement 1; the data for Drp1 KO parthenotes has been reorganized into the supplemental figure, due to lack of space in figure 2 caused by the addition of quantitative data for Drp1 Trim-Away embryos). The size of mitochondrial clusters in Drp1 KO parthenotes was significantly increased compared to controls, but as the reviewer noted, mitochondrial aggregation appears to be moderate compared to that in Drp1-depleted embryos. The phenotypic discrepancies in two Drp1-deficient embryo models is discussed below.

      First, it is clear that phenotypic severity of Drp1 KO oocytes is dependent on the age of the female. Indeed, oocytes collected from 8-week-old female arrested meiosis after NEB, mainly due to marked mitochondrial aggregation (Udagawa et al., Curr Biol. 2014, PMID: 25264261), whereas oocytes from juvenile female completed meiosis (Adhikari et al., Sci Adv. 2022, PMID: 35704569), and thus Drp1 KO pathenotes were obtained from juvenile female in the present study. Comparison of mitochondrial morphology in Drp1 KO oocytes in both papers also suggests that mitochondrial aggregation in adult mice is more intense (Udagawa et al., Curr Biol. Fig. 2A) than in juvenile mice (Adhikari et al., Sci Adv. 2022: Fig. 1G, 1H), and appears to be similar to Drp1-depleted embryos in this study (Figure 2E). There may be differences in the level of Drp1 depletion in these Drp1-deficient oocytes/zygotes. Similar results occurring between juvenile and adult KO female have been reported in a previous paper (Yueh et al., Development 2021, PMID: 34935904), as adult-derived Smac3<sup>Δ/Δ<?sup> zygotes arrested at the 2-cell stage, whereas juvenile-derived Smac3<sup>Δ/Δ<?sup> zygotes have developmental competence comparable to the wild type. Remarkably, the SMC3 protein levels in juvenile Smac3<sup>Δ/Δ<?sup> oocytes was also comparable to Smc3<sup>fl/fl</sup>. The authors surmised that the decline maternal SMC3 between juvenile and sexual maturity is probably due to the continuous induction of the promoter-Cre driver, suggesting that similar induction may also occur in Drp1 KO oocytes. In addition, we also observed not only age differences but also batch differences in Drp1 KO oocytes (and resulting embryos) such that little mitochondrial aggregation was observed in oocytes collected from some juvenile KO colonies. Therefore, for KO models showing age (sexual maturation)-dependent gradual phenotypic changes, Trim-way may be an approach that provides more reproducible results as it induces acute degradation of maternal proteins.

      (4) Mitochondrial clusters in Drp1 trim-away zygotes resemble those seen when defects in mitochondrial positioning are obtained by TRAK2 induction (PMID: 38917013), pointing again to a role of actin in the clustering process. Could the authors explore the role of actin further?

      TRAK2 and microtubule-dependent mechanisms may also be involved in mitochondrial dynamics during the first cleavage division, possibly in association with migration of two pronuclei. Although the mitochondrial aggregation induced by TRAK2 overexpression is similar to that in Drp1-depleted embryos, it is unlikely that changes at the EM level occurred as seen in Drp1-depleted embryos (enlarged mitochondria, etc.). In addition, in TRAK2-overexpressing embryos, rather than uneven partitioning of mitochondria, the daughter blatomeres themselves were uneven in size after cleavage, making it difficult to precisely assess the similarity between the two models.

      Regarding the role of F-actin, we show that the subcellular distribution of cytoplasmic actin overlaps with that of mitochondria throughout the first cleavage and seems to accumulate in aggregated mitochondria, particularly during the mitotic phase, as higher correlation was observed (Figure 1E). Although it was not observed that actin and the myo19 motor regulate mitochondrial partitioning, as reported in somatic cell-based studies, it is possible that actin accumulated in mitochondria may be indirectly involved in mitochondrial dynamics via mitochondrial fission. For example, inverted formin 2 (INF2) enhance actin polymerization and is required for efficient mitochondrial fission as an upstream function of Drp1 (Korobova et al., Science 2013, PMID: 23349293). In the revised manuscript, we have added the description on this point. (Line 452-456)

      (5) Electron microscopy images showed indeed aberrant morphology of the mitochondria, yet not a hyperfused morphology. Aspect ratio (long/short axis) quantification should be included, besides the current measurement, since mitochondria in Drp1 trim-away look bigger yet as round as in the control.

      In the revised version, detailed quantitative data on EM images has been added (Figure 2J-2M). In Drp1 depleted embryos, significant increases were observed in both the major and minor axes of mitochondria. As the reviewer noted, we also assumed that mitochondria in depleted embryos were enlarged rather than elongated, but the quantification of aspect ratio shows that significant elongation occurred. These results has been described in the revised manuscript (Line 252-256).

      (6) Why are mitochondria in golgi-mcherry-expressing cells showing a different morphology of the clusters?

      As noted by the reviewer, compared to other mitochondrial images, Drp1-depleted embryos expressing Golgi-mCherry appear to have less mitochondrial aggregation. The exact reason is not known, but may be due to inter-lot variation of Trim21 mRNA used in this experiment. Nevertheless, substantial mitochondrial aggregation was observed compared to the control, which does not seem to affect the conclusion.

      (7) Authors comment on ROS being enriched (highly accumulated) in mitochondria. However, while quantification is missing, it might seem that ROS are equally distributed in control or Drp1 Trim-Away embryos. Could the authors quantify ROS signal inside and outside of the mitochondria, perhaps using a mask drawn by mitotracker? Furthermore, it would make these data more convincing to artificially induce/deplete ROS to validate the sensitivity of the technique to variations. Also, why is ROS pattern referred to as ectopic?

      Thank you for your useful suggestions. In the revised version, masked binary images were created from mitochondrial images to quantify ROS levels inside and outside mitochondria (Line 734-741). The result shows the accumulation of ROS to mitochondria in Drp1-depleted embryos (Figure 4-figure supplement 1E). The term ectopic was used to mean excessive accumulation of ROS in the mitochondria compared to normal embryos, but has been deleted as it is not very accurate.

      Minor comments:

      (A) Video 1: images at t=-00:20 and t=00:00 of the mtGFP are actually the same images as H2B-mCherry.

      Probably a faulty filter/shutter control failed to capture GFP fluorescence at these times. It appears that the autocontrast function detected a small amount of mCherry fluorescence leakage. It would be possible to replace it with another video, but as the relevant frame were unrelated to the analysis, the previous video was used as is. The same problem also occurs in the newly added Myo19-depleted zygote movie (Figure 1-Video 2, 03:15).

      (B) Could you calculate the degree of colocalization between mt-GFP and ER-mCherry in ctrl and Drp1 trim-away? While it is apparent that ER is somehow more associated with mitochondrial clusters, it would be informative to quantify it.

      Since the ER is partially confined to the mitochondrial aggregation site, it was difficult to calculate correlation coefficients from fluorescence images of mt-GFP and ER-mCherry to quantitatively assess colocalization. Instead, line scan analysis of whole mitochondrial clumps showed that the peak of the ER-mCherry signal overlaps with that of mt-GFP, but this is not the case for Golgi-mCherry or peroxisome-mCherry (Figure 2-figure supplement 2A-2C).

      (C) Regarding the developmental arrest: The quantification of the different stages at each developmental time could be more informative. For example, at E4.5 how many embryos are at each stage (2-cell, 4-cell, ... blastocyst)? Also, could the authors comment on the reduction in developmental competence in Figure 4C, regarding the blastocyst stage?

      Many arrested embryos do not maintain their morphologies and undergo a unique degenerative process over time, known as cell fragmentation. Therefore, it is difficult to accurately determine the number of each developmental stage at, for example, E4.5 days. In this study, the 2-cell stage was observed at E1.5, the 4-8 cell at E2.5-E3.0, morula at E3.5 and the blastocyst at E4.5.

      Although the rate of embryos reaching the blastocyst stage was reduced compared to that of normal embryos, the overexpression of mCh-Drp1 may explain the failure of complete restoration of developmental competence, since embryos injected solely with mCh-Drp1 mRNA also showed reduced developmental competence. For rescue experiments, the comparison with internal controls is more important and therefore we described below. This is a specific effect of Drp1 deletion because none of the internal control conditions increased arrest at the 2-cell stage and arrest was completely reversed by microinjecting Trim-away insensitive exogenous mCh-Drp1 mRNA (Line 337-340).

      (D) In lines 103 to 105, proliferation should be changed to division or development.

      In the revised version, proliferation has been changed to division (Line 103).

      (E) Could the authors reference the statement in lines 168-169?

      The following 3 references have been added (Hardy et al., 1993, PMID: 8410824; Meriano et al., 2004, PMID: 15588469; Seikkula et al., 2018, PMID: 29525505).

      (F) Line 448: "Cells lacking Drp1 have highly elongated mitochondria that cannot be divided into transportable units,..." This is clearly not the case for zygotes, so why are then these mitochondria still clustering and not transported elsewhere?

      Although it is difficult to answer this reviewer's question precisely, EM images of Drp1-depleted embryos suggest that individual mitochondria appear not only to be enlarged but also to have increased outer membrane attachment due to excessive aggregation. Thus, these large mitochondrial clumps may therefore be preventing transport.

      Reviewer #2 (Public review):

      We thank reviewer 2 for the helpful comments. As indicated in the responses below, we have taken all comments and suggestions into consideration in this revised version of the manuscript.

      Weaknesses:

      The authors first describe the redistribution of mitochondria during normal development, followed by alterations induced by Drp1 depletion. It would be useful to indicate the time post-hCG for imaging of fertilised zygotes (first paragraph of the results/Figure 1) to compare with subsequent Drp1 depletion experiments.

      In the revised version, the time after hCG has been indicated (Line 176-182). In subsequent Drp1 depletion experiments, the revised version notes that “no significant delay in cell cycle progression was observed following Drp1 depletion (data not shown) compared to control embryos (Figure 1A)” (Line 291-193). There was a slight discrepancy in the time post-hCG between live imaging and immunofluorescence analysis (Figure 1-figure supplement 1A), which may be due to manipulation of zygotes outside incubator during the microinjection of mRNA.

      It is noted that Drp1 protein levels were undetectable 5h post-injection, suggesting earlier times were not examined, yet in Figure 3A it would seem that aggregation has occurred within 2 hours (relative to Figure 1).

      As the reviewer pointed out, the depletion of Drp1 is likely to have occurred at an earlier stage. In this study, due to the injection of various mRNAs to visualize organelles such as mitochondria and chromosomes, observations were started after about 5 h of incubation for their fluorescent proteins to be sufficiently expressed. Therefore, for the Western blot analysis, samples were prepared according to the time of the start of the observation.

      Mitochondria appear to be slightly more aggregated in Drp1 fl/fl embryos than in control, though comparison with untreated controls does not appear to have been undertaken. There also appears to be some variability in mitochondrial aggregation patterns following Drp1 depletion (Figure 2-suppl 1 B) which are not discussed.

      In the revised version, mitochondrial aggregation has been quantified by comparing the cluster size and number in control, Drp1 Trim-Away and Drp1 Trim-Away embryos expressing exogenous Drp1 (mCh-Drp1) (Figure 2G, 2H). We have also quantified the mitochondrial aggregation in Drp1<sup>fl/fl</sup> and Drp1<sup>Δ/Δ</sup> parhenotes (Figure 2-figure supplement 1; note that the data for Drp1 KO parthenotes has been reorganized into the supplemental figure, due to lack of space in figure 2 caused by the addition of quantitative data for Drp1 Trim-Away embryos). Mitochondria appear to be slightly more aggregated in Drp1<sup>fl/fl</sup> embryos than in control, but no significant differences in cluster size or number were observed (data not shown). On the other hand, mitochondrial clusters in Drp1 Trim-Away embryos were remarkably larger than Drp1<sup>Δ/Δ</sup> parhenotes, Please refer to the response to reviewer 1's comment (3) for discussion of this discrepancy.

      As noted by the reviewer, compared to other mitochondrial images, Drp1-depleted embryos expressing Golgi-mCherry appear to have less mitochondrial aggregation. The exact reason is not known, but may be due to inter-lot variation of Trim21 mRNA used in this experiment. Nevertheless, substantial mitochondrial aggregation was observed compared to the control, which does not seem to affect the conclusion.

      The authors use western blotting to validate the depletion of Drp1, however do not quantify band intensity. It is also unclear whether pooled embryo samples were used for western blot analysis.

      In the revised version, the band intensities in Western blot analysis were quantified and validated the previous results (Figure 1H for Myo19 depletion, Figure 2B for Drp1 expression during preimplantation development, Figure 2D for Drp1 depletion). The number of embryos analyzed was described in Figure legends (Pooled samples ranging from 20 to 100 were used).

      Likewise, intracellular ROS levels are examined however quantification is not provided. It is therefore unclear whether 'highly accumulated levels' are of significance or related to Drp1 depletion.

      In the revised version, masked binary images were created from mitochondrial images to quantify ROS levels inside and outside mitochondria (Line 734-741). The result shows the accumulation of ROS to mitochondria in Drp1-depleted embryos (Figure 4-figure supplement 1E).

      In previous work, Drp1 was found to have a role as a spindle assembly checkpoint (SAC) protein. It is therefore unclear from the experiments performed whether aggregation of mitochondria separating the pronuclei physically (or other aspects of mitochondrial function) prevents appropriate chromosome segregation or whether Drp1 is acting directly on the SAC.

      In the revised manuscript, we have discussed this reference (Zhou et al., Nature Communications, PMID: 36513638) (Line 482-483).

      Reviewer #2 (Recommendations For The Authors):

      The authors report that disruption of F-actin organization led to asymmetry in mitochondrial inheritance, however depletion of Myo19 does not impact inheritance. The authors note in the discussion that loss of another mitochondrial motor protein, Miro, has been shown to affect mitochondrial inheritance. They suggest this may be due to reduced levels of Myo19, despite data from the present study suggesting a lack of involvement of Myo19. Given that Miro1 also interacts with microtubules, and crosstalk between actin filaments and microtubules has been reported, have the authors considered whether other motor proteins, such as KIF5, may be involved in mitochondrial movement in the zygote and therefore inheritance? Myo19 also plays a role in mitochondrial architecture. Were any differences noted at the EM level?

      During oocyte meiosis and early embryonic cleavage, kinesin-5 has been reported to be important for the formation of bipolar spindles (Fitzharris, Curr Biol., 2009, PMID: 19465601) and may have some involvement in mitochondrial dynamics. Given that the migration of two pronuclei towards the zygotic centre is dynein-dependent manner (Scheffler Nat Commun. 2021PMID: 33547291), dynein may also be involved in the process of mitochondrial accumulation around the pronuclei. Nevertheless, whether microtubule-dependent mechanisms regulate mitochondrial partitioning remains controversial. Mitochondria basically diverge from microtubules at the onset of mitosis, and indeed Miro1-deleted zygotes did not show the asymmetric mitochondrial partitioning (Lee et al., Front Cell Dev Biol. 2022, PMID: 36325364). More recently, it was reported that overexpression of TRAK2 causes significant mitochondrial aggregation in embryos (Lee et al., Proc Natl Acad Sci U S A. 2024, PMID: 36325364), but since overexpression might disrupt a regulatory balance by other motors/adaptor complexes, further investigation using TRAK2-deficient embryos is expected.

      As noted by the reviewer, myo19 seems to be important for the maintenance of mitochondrial cristae architecture and, consequently, for the regulation of mitochondrial function (Shi et al., Nat Commun. 2022, PMID: 35562374). We have not observed the EM images in myo19-depleted embryos, but we examined their membrane potential and ROS by TMRM and H2DCF staining, respectively, and confirmed that they were comparable to control embryos (data not shown). The loss of myo19 in zygotes/embryos did not cause any functional changes in mitochondria, suggesting that mitochondrial architecture may not be substantially affected either.

      Transcriptomic analysis would be useful to identify alterations in cell cycle checkpoint regulators, as well as immunofluorescence to identify changes in spindle assembly checkpoint protein recruitment.

      The present results showed that the majority of Drp1-depleted embryos arrest at the G2 stage, possibly due to cell cycle checkpoint mechanisms. Transcriptome analysis would certainly be beneficial, but eventually more detailed analysis of proteins and their phosphorylation modifications, etc. is needed for accurate assessment. These studies will be the subject of future work.

      Minor comments:

      There are many instances where the English could be improved, particularly the overuse of the word 'the'.

      We have checked the manuscript again carefully and hopefully it has been improved some.

      Line 144: replace 'took' with 'take'.

      We have corrected this in the revised version (Line 140).

      Line 157: it is unclear what is meant by 'hinders the functional importance of Drp1 in mature oocytes and embryos'.

      This description has been corrected to “complicates the functional analysis of Drp1 in mature oocytes and embryos” (Line 152-153)

      Line 198: replace with 'displayed a mitochondrial distribution pattern closely associated with'

      We have corrected this in the revised version (Line 195-196).

      Line 200: provide a time to clarify when the cytoplasmic meshwork was 'subsequently reorganized'

      In the revised version, “at the metaphase” has been added (Line 198).

      Line 204: replace 'to' with 'for'

      We have corrected this in the revised version (Line 203).

      Lines 285-87: consider rearranging the text to improve the flow.

      To improve the flow of text before and after, the following sentence has been added; We postulated that this asymmetry was due to non-uniformity in the distribution of mitochondria around the spindle (Line 295-297)

      Line 418: replace 'central' with 'centre'

      We have corrected this in the revised version (Line 430).

      Line 427: replace 'pertaining' with 'partitioning'

      We have corrected this in the revised version (Line 438).

      Line 574: clarify to what '1-5% of that of the oocytes' refers

      We have corrected it to “1-5% of the total volume of the zygote.” (Line 587-588).

      Line 619: indicate the dilution used

      We apologize for the previous incorrect description. We used a part of the extract as the template, not a dilution, and have corrected it to be accurate (Line 631-632).

      Line 634: replace 'on' with 'in' and detail in which medium embryos were mounted.

      We have corrected this in the revised version (Line 647).

      Please check all spelling in the figures.

      Figure 1J - inheritance is spelt incorrectly.

      Figure-Suppl 1, D: Interphase (PN) and (2-cell) is spelt incorrectly. G: inheritance is spelt incorrectly.

      Figure 5F - bottom section prior to cytokinesis, spindle is spelt 'spincle'

      Ensure consistency in abbreviation use (e.g. use of NEB and NEBD).

      Thank you for your careful correction of typographical errors. In the revised version, all points raised by the reviewers have been corrected.

      Reviewer #3 (Public review):

      We thank reviewer 2 for the helpful comments. As indicated in the responses below, we have taken all comments and suggestions into consideration in this revised version of the manuscript.

      Seemingly, there are few apparent shortcomings. Following are the specific comments to activate the further open discussion.

      Line 246: Comments on cristae morphology of mitochondria in Drp1-depleted embryos would better be added.

      In the revised manuscript, we have added the following comment; swollen or partially elongated mitochondria with lamella cristae structures in the inner membrane were observed in Drp1 depleted embryos. In addition, the quantification of aspect ratio (long/short axis) shows that significant mitochondrial elongation was occurred (Figure 2M). These results has been described in the revised manuscript (Line 251-256).

      - Regarding Figure 2H: If possible, a representative picture of Ateam would better be included in the figure. As the authors discussed in line 458, Ateam may be able to detect whether any alterations of local energy demand occurred in the Drp1-depleted embryos.

      Thank you for your very useful comments. Although it would be interesting to investigate whether alterations in ATP levels occurred in localized areas (e.g., around the spindle), the present study used conventional fluorescence microscope instead of confocal laser microscopy to observe ATeam fluorescence in order to quantify the fluorescence intensity in the whole embryo (or whole blastomere) and thus we currently cannot provide the images that reviewer expected. As shown in Figure-figure supplement 1C, the ATP levels tend to be higher at the cell periphery in control and at the mitochondrial aggregation areas in Drp1-depleted embryos, but it would need high resolution images using confocal microscopy to show it clearly.

      - Line 282: In Figure 3-Video 1, mitochondria were seemingly more aggregated around female pronucleus. Is it OK to understand that there is no gender preference of pronuclei being encircled by more aggregated mitochondria?

      Review of multiple videos shows that aggregated mitochondria were localized toward the cell center, but did not exhibit the behavior of preferentially concentrating near the female pronucleus.

      - Line 317: A little more explanation of the "variability" would be fine. Does that basically mean that the Ca<sup>2+</sup> response in both Drp1-depleted blastomeres were lower than control and blastomere with more highly aggregated mitochondria show severer phenotype compared to the other blastomere with fewer mito?

      We think that the reviewer's comments are mostly correct. It is clear that there is a bias in Ca<sup>2+</sup> store levels between blastomeres of Drp1 depleted embryos, However, since mitochondria were not stained simultaneously in this experiment, we cannot draw conclusions in detail, such that daughter blastomere that inherit more mitochondria have higher Ca<sup>2+</sup> stores, or that blastomere with more aggregated mitochondria have lower Ca<sup>2+</sup> stores.

      - Regarding Figure 5B (& Figure 1-figure supplement 1B): Do authors think that there would be less abnormalities in the embryos if Drp1 is trim-awayed after 2-cell or 4-cell, in which mitochondria are less involved in the spindle?

      The marked centration of mitochondrial clusters in Drp1-depleted embryos appears to be associated with migration of the pronuclei toward the cell center, which is unique to the first embryonic cleavage. Since the assembly of the male and female pronuclei at the cell center is also unique to the first cleavage, binucleation due to mitochondrial misplacement was observed only in the first cleavage. Therefore, if Drp1 is depleted at the 2-cell or 4-cell stage, chromosome segregation errors may be less frequent. However, since unequal partitioning of mitochondria is thought to occur, some abnormalities in embryonic development is likely to be observed.

      Reviewer #3 (Recommendations For The Authors):

      Specific comments

      - Line 262: "Since mitochondrial dynamics are spatially coordinated at the ER-mitochondria MCSs," adequate ref. would better be added.

      We have added an adequate reference to the revised manuscript (Friedman et al., 2011, PMID: 21885730).

      - Line 333-336: "...as assessed by the presence of the nuclear envelope." Do authors show the data? In Figure 4-figure supplement 1A, the difference of the phosphoH3-ser10 signal between control and Trim-Away group might be weak. For clarity, it would be helpful if authors indicate the different points to note in the figure.

      Although the data is not shown, nuclear staining of arrested 2-cell stage embryos exhibited clear nuclear membranes, similar to the DAPI image in Figure 4-figure supplement 1A. We have indicated that the data is not shown in the revised version (Line 345). Based on a report that phosphorylated histone H3 (Ser10) localizes in pericentromeric heterochromatin that hat can be visualized by DAPI staining in late G2 interphase cell (Hendzel et al., 1997, Chromosoma, PMID: 9362543), this study qualitatively estimated the G2 phase from the phosphorylated histone H3 signal and the DAPI counterstained images. We have noted this point in the revised figure legend (Line 1012-1014).

      Typos or points for reword/rephrase

      - Line 149: "molecular identification" may better be " molecular characteristics".

      We have corrected this in the revised version (Line 145).

      - Line 157: "hinders the functional importance" would be "implies the functional importance" or "complicates the functional analysis".

      We have corrected this in the revised version (Line 152-153).

      - Line 208: "Since the role of F-actin in many cellular events, such as cytokinesis, preclude them as targets for experimentally manipulating mitochondrial distribution, " may better be "Given many cellular roles, disruption of F-actin per se was unsuitable as a strategy for manipulating mitochondrial distribution", for example.

      We have corrected this in the revised version (Line 207-208).

      - Line 260: "with MCSs with the plasma.." may better be "with MCSs such as with the plasma..".

      We have corrected this in the revised version (Line 267-268).

      - Line 312: "distribution and segregation" may better be "distribution and the resulting segregation of the inter-organelle contacts".

      We have corrected this in the revised version (Line 324-325).

      - Line 427: "pertaining" might be "partitioning".

      We have corrected this in the revised version (Line 438).

      Line 463: "loss of Drp1 induced mitochondrial aggregation disturbs" may better be "mitochondrial aggregation induced by the loss of Drp1 disturbs".

      We have corrected this in the revised version (Line 478-479).

      - Line 752: "endoplasmic reticulum (pink) " would be " endoplasmic reticulum (aqua) ".

      We have corrected this in the revised version (Line 780).

      - Figure 5E: "(Noma 2-cell embryos)" would be "(Nomal 2-cell embryos)".

      - Figure 5F: "Mitochondrial centration prevents dual spincle assembly" would be "Mitochondrial centration prevents dual spindle assembly".

      Thank you for your careful correction of typographical errors. We have corrected all the words/expressions the reviewer pointed out in the revised version.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      General Statements

      We sincerely thank all three reviewers for their thoughtful and constructive feedback. Your comments were invaluable in improving the clarity and quality of our work.

      In this study, we revisit a previously overlooked lipophilic dye, demonstrating its utility for live-cell imaging that transport in a non-vesicular pathway and label autophagy related structures. Against the backdrop of increasing attention to membrane contact sites (MCSs), bridge-like lipid transfer proteins (BLTPs), and organelle biogenesis, we aim to propose the possibility of a reversible one-way phospholipid transfer activity that really takes place in living cells.

      As Reviewer #1 noted, recent cryo-EM studies (e.g., Oikawa et al.) have highlighted the importance of lipids in autophagosome formation. And there are some existed in vitro studies. However, we believe that we have to think about the consistence of simplified in vitro reconstitution and the complex real cellular environment. In addition, to our knowledge, no studies have directly tracked lipid flow dynamics over time in living cells. We believe our work contributes to this gap by combining three interesting technical approaches: (a) R18 as a lipid-tracing dye, (b) FRAP analysis on the isolation membrane, and (c) the use of Ape1 overexpression to stall autophagosome closure, enabling us to visualize reversible lipid flow in vivo. While these techniques may not appear "fancy," we hope they offer new insights that can inspire further exploration in lipid dynamics story in a real cellular environment.

      We appreciate Reviewer #2's comments on our high imaging quality and Reviewer #3's recognition of our approach as an elegant way to study lipid transfer. We have revised the manuscript accordingly and included additional explanations, figure clarifications, and planned experiments to address remaining concerns.

      As two key concerns were raised repeatedly by all reviewers, we would like to address them here:

      1. Regarding the concern that the evidence for reversible lipid transfer from the IM to the ER is not sufficiently strong:

      We are deeply grateful to Reviewer #2 for the insightful suggestion to compare the fluorescence recovery of the adjacent bleached ER to that of the ER-IM MCS, to exclude the possibility that recovery at the ER-IM MCS originates from nearby ER rather than from the IM. Following this suggestion, we performed a quantitative analysis using unbleached ER as a background. Interestingly, in every sample, the adjacent bleached ER consistently showed a significantly lower fluorescence recovery than the ER-IM MCS. We also used the IM as a background for normalization, the difference became even more pronounced, further supporting the idea that the adjacent ER could not be the source of the recovery signal at the ER-IM MCS. These findings strengthen our conclusion that phospholipid recovery at the MCS could be derived from the IM. The updated analysis and corresponding figure panels (Figure 5K, 5L, and 5M), along with the relevant text (lines 384-396), have been revised accordingly.

      Regarding the concern that the evidence for R18 transfer via Atg2 as a bridge-like lipid transfer protein is not sufficiently direct:

      In addition to the evidence presented in this manuscript, we have now cited our parallel study currently under revision (Sakai et al., bioRxiv 2025.05.24.655882v1), where we provide direct evidence that Atg2 indeed functions as a bridge-like lipid transfer protein, rather than a shuttle. Importantly, we also show in that study that R18 transfer requires the bridge-like structure of Atg2. This new reference has been cited in the revised manuscript, and relevant textual explanations have been added to provide further support.

      We hope that the revisions and our revision plan can address the reviewers key concerns. Please find our detailed point-by-point responses below.

      Response to the Reviewer ____#____1

      In their study, Hao and colleagues exploited the fluorescent fatty acid R18 to follow phospholipid (PL) transfer in vivo from the endoplasmic reticulum to the IM during autophagosome formation. Although the results are interesting, especially the retrograde transport of PLs, based on the provided data, additional control experiments are needed to firmly support the conclusions.

      We sincerely thank the reviewer for the positive assessment and agree that additional controls are necessary to support our conclusion. Detailed responses and corresponding revisions are provided below.

      An additional point is that the authors also study the internalization of R18 into cells and found a role of lipid flippases and oxysterol binding proteins. While this information could be useful for researchers using this dye, these analyses/findings have no specific connection with the topic of the manuscript, i.e. the PL transfer during autophagosome formation. Therefore, they must be removed.

      We thank the reviewer for the thoughtful comment. We understand the concern that the R18 internalization analysis may appear peripheral to the manuscript's main focus on phospholipid transfer during autophagosome formation. However, we respectfully believe that this section is critical for establishing the mechanistic basis as this study represents the first detailed in vivo application of R18 for tracing lipid dynamics. We believe it is interesting that R18 entry is not due to chemically passive diffusion or non-specific adsorption, but occurs through a biologically regulated, non-vesicular lipid transport pathway. This mechanistic context underpins the reliability of using R18 to monitor ER-to-IM lipid transport in the autophagy pathway.

      To improve clarity and coherence, we have added explanatory text in the Introduction and at the start of the Results section to explicitly link the internalization assay to the subsequent autophagy-related experiments (line 94-98, 185-187). We hope this helps guide the reader through the rationale and relevance of this part of the study.

      Major points:

      1) In general, the quality of the microscopy images are quite poor and this make it difficult to assert some of the authors' conclusions.

      We thank the reviewer for the feedback. To better address this concern, we would appreciate clarification regarding which specific images or figure panels were found to be of low quality. Overall, we believe the microscopy data presented are of sufficient resolution and clarity to support our main conclusions, as also noted by Reviewer #2 ("the high-quality images and FRAP experiments").

      We acknowledge that certain phenomena-such as occasional R18 labeling of the vacuole-were not clearly explained in the original manuscript. We have now included additional clarification in the results section and mentioned this limitation in the discussion (lines 170-171, 436-438), along with a note on ongoing experiments to further investigate this point.

      2) It would be important to perform some lipidomics analysis to determine in which PLs and other lipids or lipid intermediates R18 is incorporated. First, it will be important to know which the major PL species are are labelled under the conditions of the experiments done in this study. Second, the authors assume that all the R18 is exclusively incorporated into PLs and this is what they follow in their in vivo experiments. What about acyl-CoA, which has been shown to be a key player in the IM elongation (Graef lab, Cell)?

      We thank the reviewer for raising this point. However, we believe this is based on a misunderstanding of the chemical nature of R18. R18 is not a free fatty acid analog and cannot be incorporated into phospholipids or acyl-CoA via metabolic pathways. Due to its chemical structure-a bulky rhodamine headgroup attached to a long alkyl chain-it cannot undergo enzymatic conjugation or incorporation into membrane lipids. This is why we did not pursue lipidomics analysis. Instead, we focused on characterizing the biological behavior of R18 through a range of live-cell assays, including temperature and ATP dependency, involvement of flippases, OSBP proteins, and Atg2, all of which support a regulated, non-vesicular lipid transport pathway. Additionally, the AF3 structural model presented in this study is consistent with this interpretation, showing no evidence of R18 forming chemical bonds with phospholipids.

      3) Figure 1A and 1B. The authors conclude that Atg2 is involved in the lipid transfer since R18 does not localize to the PAS/ARS in the atg2KO cells. However, another possible explanation is that in those cells the IM is not formed and does not expand, and con sequetly R18 is present in low amounts not detectable by fluorescence microscopy. To support their conclusion, the authors must assess PAS-labelling with R18 in cells lacking another ATG gene in which Atg2 is still recruited to the PAS.

      We thank the reviewer for this important suggestion. As noted, the absence of R18 at the PAS in atg2Δ cells may reflect a lack of membrane formation rather than impaired lipid transfer. However, in support of our interpretation, our previous work (Hirata E, Ohya Y, Suzuki K, 2017) has shown that R18 accumulates at PAS-like structures in delipidation mutants, where the IM fails to expand but Atg2 is still recruited (please refer to the attached revision plan for further details). This suggests that the presence of Atg2, rather than the mere existence of a mature IM, contributes to R18 localization.

      To address this, we revised our statement to the more cautious: "R18 was undetectable at the PAS in atg2Δ cells," to avoid overinterpretation (lines 119-120). 4)

      4) Figure 2. As written, the paragraph this figure seems to indicate that flippases are directly involved in the translocation of R18 from the PM to the ER. As correctly indicated by the authors, flippases flip PLs, not fatty acids. Moreover, there are no PL synthesizing at the PM and thus probably R18 is not flipped upon incorporation into PL. As a result, the relevance of flippase in R18 internalization is probably indirect. This must be explained clearly to avoid confusion/misunderstandings.

      We thank the reviewer for this important clarification. We fully agree that flippases act on phospholipids, not fatty acids, and that R18 is not metabolically incorporated into phospholipids at the plasma membrane. However, our ongoing work (Rev. Figure 1) shows that R18 preferential labeling affinity for PS and PE in vivo (yeast phospholipid synthesis mutants), consistent with its flippase-dependent localization. Flippases are known to specifically flip PS and PE. While R18 itself is not enzymatically modified or incorporated into phospholipids, its membrane distribution may thus depend on the lipid environment and the activity of lipid-translocating proteins.

      Preliminary data supporting this observation are included in the "Supplementary Figures for reviewer reference only" and are not part of the public submission.

      5) A couple of manuscript has shown a (partial) role of Drs2 in autophagy. The authors must explain the discrepancy between their own results and what published, especially because they use the GFP-Atg8 processing assay, which is less sensitive than the Pho8delta60 used in the other studies.

      We thank the reviewer for raising this important point. We are aware of prior reports implicating Drs2 in autophagy and in fact discussed this work directly with the authors during the course of our experiments, who kindly provided helpful suggestions. While our GFP-Atg8 processing assay did not show significant defects upon Drs2 deletion, strain background differences may explain this discrepancy. We also appreciate the suggestion to use the Pho8Δ60 assay and plan to include it in future experiments.

      Additionally, authors should check whether the Atg2 and Atg18 proteins are present at the IM-ER membrane contact sites in the same rates after nutrient replenished than when cells are nitrogen-starved, since this complex would determine the lipid transfer dynamics at this membrane contact site.

      We thank the reviewer for the helpful suggestion. We plan to perform additional experiments to monitor Atg18 localization during the nutrient replenishment assay.

      6) Authors used a predicted Atg2 lipid-transfer mutant (Srinivasan et al, J Cel Biol, 2024), but not direct prove that this mutant is defective for this activity. As previously done for other Atg2/ATG2-related manuscripts (Osawa et al, Nat Struct Mol Biol, 2019; Valverde et al, J Cel Biol, 2019), this must be measure in vitro. Moreover, they do not show whether other known functions of Atg2 are unaffected when expressing this Atg2 mutant, e.g. formation of the IM-ER MCSs, Atg2 interaction with Atg9 and localization at the extremity of the IM...

      We thank the reviewer for this concern. The lipid-transfer-deficient Atg2 mutant used here is based on the same structural rationale as in our recent parallel study (Sakai et al., bioRxiv 2025; https://www.biorxiv.org/content/10.1101/2025.05.24.655882v1, currently under revision). In that study, we addressed whether Atg2 indeed functions as a bridge-like lipid transfer protein, and also used R18 to directly demonstrate the lipid transfer defect of this Atg2 mutant in vivo.

      We therefore believe that referencing this study provides mechanistic support for the use of this Atg2 mutant in the current manuscript. A citation and brief explanation have now been added to the revised text (line 315-316, 439-441). We also plan to perform the lipid transfer assay in vitro.

      7) The mNG-Atg8 signal is not recovered in the fluorescent recovery assays. Based on the observation that R18 signal comes back after photobleaching, authors suggest that the supply of Atg8 is not required for IM expansion. This idea is opposite to data where the levels of Atg8 and deconjugation of lipidated Atg8 determines the size of the forming autophagosomes (e.g., Xie et al, Mol Biol Cell, 2008; Nair et al, Autophagy, 2012). Similar results have also been obtained in mammalian cells (Lazarou and Mizushima results in cell lacking components of the two ubiquitin-like conjugation systems). This discrepancy requires an explanation.

      We thank the reviewer for pointing out this imprecise interpretation, and we sincerely apologize for the confusion it may have caused. We fully agree that Atg8 is essential for the expansion of the isolation membrane (IM), as supported by previous studies. In our FRAP data, mNG-Atg8 showed gradual recovery at the later timepoints, indicating that Atg8 can be replenished over time. The reason why R18 recovery appears much more rapid is likely due to the inherently fast lipid transfer activity of Atg2, the bridge-like lipid transport protein. In contrast, Atg8 signal recovery may have been delayed for two reasons: (1) slower recruitment kinetics to the IM, and (2) partial depletion of the available mNG-Atg8 protein pool due to photobleaching during the experiment.

      We have revised the relevant paragraph in the manuscript (line 326-330) to clarify these points and avoid potential misinterpretation.

      8) Although authors claim that there is a retrograde lipid transfer from the IM to the ER, based on the data, it quite difficult to extract these conclusions as they show a decrease in the lipid flow dynamics rather to an inversion of the lipid flow per se. Can the authors exclude that ER microdomains are formed at the ERES in contact with the IM, and consequently what they measure is a slow diffusion of R18-labeled lipid from other part of the ER to these ERES?

      We appreciate the reviewer's insightful comment. Indeed, we are also considering the possibility that lipid-enriched microdomains may form in the ER and contribute to complex lipid dynamics at contact sites. However, direct visualization of such domains in cells remains technically challenging, this remains one of the important directions we aim to pursue in future studies. While our current data do not allow us to definitively state that all recovered lipids originate from the IM, our FRAP experiments provide indirect yet strong support for the possibility that at least a substantial portion of the recovered lipid signal in the ER derives from the IM. Moreover, following Reviewer 2's major point No.4, we performed a direct comparison of R18 fluorescence recovery between the photobleached ER-IM MCS region and the adjacent bleachedER region (Figure 5K and 5M). Interestingly, each sample consistently showed lower fluorescence recovery in the adjacent bleached ER near the ER-IM MCS (mean = 0.20), compared to the ER-IM MCS region (mean = 0.28). To further validate this observation, we also used the IM as a background reference for normalization. This analysis revealed a more significant difference, with the adjacent bleached ER near the ER-IM MCS showing a lower recovery (mean = 0.47) than the ER-IM MCS (mean = 0.80).

      As the Reviewer2 pointed out, these results support our reversible lipid transfer model by demonstrating that fluorescence recovery at the ER-IM MCS is due to the signal coming from the IM, rather than from the adjacent bleached ER, which recovers more slowly and less efficiently. We have incorporated this new analysis into Figure 5, and accordingly revised the figure legend and main text (lines 384-396).

      9) The retrograde PL transfer is studied in cells overexpressing Ape1, in which IM elongation is stalled. This is a non-physiological experimental setup and consequently it is unclear whether what observed applies to normal IM/autophagosomes. This event should be shown to occur in WT cells as well.

      We thank the reviewer for this point. Indeed, it remains technically difficult to visualize lipid flow during normal IM expansion in vivo, as this process is rapid and transient. And to date, there are no reports directly addressing lipid flow in this process.

      But the Ape1 overexpression system provides a strategic advantage by temporally extending the IM elongation phase and spatially enlarging the IM, thus offering a unique opportunity to capture membrane behavior that would otherwise be transient and difficult to resolve. Importantly, this system arrests autophagosome closure, which we leveraged to investigate the potential reversibility of phospholipid transfer in a controlled and prolonged context. Without this system, it would be exceedingly difficult for reaserchers to examine the lipid flow directionality in living cells.

      Furthermore, the use of Ape1 overexpression has been widely employed in previous high-impact autophagy studies. We emphasize that our aim is to understand Atg2-mediated lipid transfer, and in this context, the Ape1 system provides a valuable and informative tool without compromising the validity of our conclusions.

      10) From the images provided, it appears that R18 also labels the vacuole. The vacuole form MCSs with the IM. Can the author exclude a passage of R18 from the vacuole to the IM?

      We thank the reviewer for the insightful comment. Our data suggest that R18 traffics from the plasma membrane to the ER, then to autophagy-related structures. Actually, following that, as we kown, autophagosomes will eventually reaches and fused with the vacuole. This explains the occasional weak R18 signals at the vacuole membrane, particularly in late-stage cells. We have revised the figure and clarified this point in the text to avoid oversimplification of R18 localization (lines 169-171, 426-428)

      Here we also added the results of our onging work (in preparation). R18 tends to accumulate in a dot-like compartment after prolonged rapamycin treatment and incubation (Rev. Figure 2). And the vacuolar labeling of R18 correlates with the degradation status of autophagosomes, rather than reverse lipid transport from the vacuole to the IM (Rev. Figure 2). Taken together, we believe that R18 transport from the vacuole back to the IM is unlikely.

      Preliminary data supporting this response are included in the "Supplementary Figures for reviewer reference only" and are not part of the public submission.

      Minor points:

      1) L66. One report has indicated that Vps13 may also play a role in the transfer of lipids from the ER to the IM (Graef lab, J. Cell Biol).

      Thank you for pointing this out. Their excellent work also suggested that the inherent lipid transfer activity of Atg2 is required for IM expansion. We have revised the sentence (lines 67-68, 312-314) and included the appropriate citation at these two places.

      2) L70. It must be indicated that IM is also called phagophore.

      We have revised the sentence (line 70-71). Thank you for pointing this out.

      3) L74. It is mentioned "Additionally, a hydrophobic cavity in the N-terminal region of Atg2 directly tethers Atg2 to the ER, particularly the ER exit site (ERES), which is considered a key hub for autophagosome biogenesis", but there is no experimental evidence supporting that Atg2 is involved in the tethering with the ERES.

      Thank you for pointing this out. We have removed the N-terminal region part and revised the sentence accordingly (line 79-81) to avoid overstatement.

      4) L90. PAS must be listed between the ARS.

      We have revised the sentence (line 97-98). Thank you for pointing this out.

      5) Upon deletion of ATG39 and ATG40, there is a pronounced reduction of mNG-Atg8 labelled with R18. This would suggest that these two ER-phagy receptors are required for the PL transfer from the ER to the IM, which is not the case as autophagy is mildly affected by the absence of them (e.g., Zhang et al, Autophagy, 2020).

      We thank the reviewer for the important comment and agree that Atg39 and Atg40 are not required for phospholipid transfer from the ER to the IM. We have revised the text (lines 155-157). We appreciate if the reviewer could provide the DOI or PubMed ID for this paper.

      6) Authors referred that "no direct evidence has been found to confirm lipid transfer at the ER-IM MCS in living cells" (lines 282-283). However, a recent paper has shown that de novo-synthesized phosphatidylcholine is incorporated from the ER to the autophagosomes and autophagic bodies (Orii et al, J Cel Biol, 2021). This reference should be mentioned in the manuscript.

      Thank you for your insightful reminder. This paper beautifully demonstrated the importance of de novo-synthesized phosphatidylcholine in autophagy using electron microscopy. We have now included its citation and brief discussion in the revised manuscript (lines 74-76, 297-298). However, we respectfully note that direct observation of lipid transfer at the ER-IM MCS in living cells still remains unproven.

      7) In lines 252-253, the sentence "R18 transport from the PM to the ER was partially impaired in osh1Δ osh2Δ, osh6Δ osh7Δ, and oshΔ osh4-1 cells (Figure S3). These results suggest that Osh proteins participate in transferring R18 from the PM to the ER" does not recapitulate what is observed in Fig. S3. Moreover, the Emr lab has generate a tertadeletion mutant in which the PM-ER MCSs are abolished. The authors could examine this mutant.

      We thank the reviewer for this helpful comment and sincerely apologize for the lack of clarity in our original description. Our conclusion was primarily based on the partial PM accumulation of R18 observed in some osh mutant strains shown in Figure S3, which motivated us to further investigate this pathway using the OSW-1 inhibitor. We have revised the corresponding text to improve the logic and clarity of this section.

      We appreciate the recommendation of the tether∆ mutant. Our preliminary tests indicate that R18 still properly labels the ER in tether∆ cells, suggesting that its localization is not due to passive diffusion at membrane contact sites, but rather involves specific transport mechanisms. As this is an initial observation, we plan to confirm the result and include it in a future revision.

      Reviewer #1 (Significance (Required)):

      General assistent: Strength: potential new system to monitor lipid flow Limitations: Indirect evidences and in the case of the retrograde transport of phospholipids, it could be an artefact of the employed experimental approach. Advance: Little advances because something in part already shown in vitro. No new mechanisms uncovered. Audience: Autophagy and membrane contact site fields.

      We sincerely thank the reviewer for the overall evaluation. We agree that our current system offers indirect but promising evidence for lipid transfer events at ER-IM contact sites in vivo. While Atg2-mediated lipid transport has been proposed in vitro, our study adds value by (1) establishing a live-cell imaging way to monitor lipid flow in a non-vesicular transport pathway, (2) proposing a model of reversible one-way lipid transfer activity, and (3) addressing whether findings from simplified in vitro reconstitution accurately reflect the dynamics in the more complex real cellular environment.

      We recognize the limitations of our current approach and plan to include additional analyses to more cautiously interpret the observed retrograde movement. Although we do not claim to identify a new mechanism, we believe our work provides an interesting framework to inspire future efforts aimed at directly probing lipid flow at membrane contact sites in vivo.

      We also sincerely appreciate the reviewer's recognition of the potential value of this system for the autophagy and membrane contact site communities.

      Response to the Reviewer ____#2

      Non-vesicular lipid transfer plays an essential role in organelle biogenesis. Compared to vesicular lipid transfer, it is faster and more efficient to maintain proper lipid levels in organelles. In this study, Hao et al. introduced a high lipophilic dye octadecyl rhodamine B (R18), which specifically labels the ER structures and autophagy-related structures in yeast and mammalian cells. They characterised its distinct lipid entry into yeast cells via lipid flippase Neo1 and Drs2 on the plasma membrane, rather than through the endocytic pathway. They then demonstrated that R18 intracellular trafficking through plasma membrane to ER depends on "box-like" lipid transfer Osh proteins. They further looked into the "bridge-like" lipid transfer protein Atg2, using R18 as a lipid probe to track lipid transfer from ER to the isolation membrane (IM) during membrane expansion and reversible lipid transfer through IM to the ER-IM membrane contact sites (MCS) when autophagy is terminated by nutrient replenishment. The authors provide an interesting model of reversible directionality of Atg2 lipid transfer during autophagy induction and termination.

      We sincerely thank the reviewer for the thoughtful and constructive summary of our work. We are grateful for the recognition of the novelty of using R18 to visualize non-vesicular lipid transfer in vivo and for highlighting the conceptual contribution of our proposed model of reversible Atg2-mediated transport during autophagy.

      In response to the reviewer's valuable suggestions, we have revised key parts of the manuscript and prepared a detailed revision plan to address the specific concerns. We truly appreciate the reviewer's insights, which have been instrumental in improving the clarity of our study.

      Major points:

      1. Line 299-309: The FRAP assays were interesting and well performed. The authors photobleached R18 and Atg8 signal, and found R18 fluorescence recovery but not Atg8, which suggests lipid transfer occurs between ER and the IM and faster than Atg8 lipidation process during IM expansion. These results gave clear evidence that R18 can be transferred during IM expansion. The supply of Atg8 may not be not able to track within this time frame or the recovered amount of Atg8 may not be able to visualized due to the threshold limitation with confocal microcopy. This does not imply the supply of Atg8 to the IM is not required during IM expansion. This should be clarified.

      We thank the reviewer for this valuable comment and fully agree that Atg8 is essential for IM expansion. We apologize for any ambiguity that may have suggested otherwise.

      As pointed out, the lack of mNG-Atg8 recovery in our FRAP assay likely reflects the slower turnover of lipidated Atg8, limited observation time, and photobleaching of the existing protein pool. Notably, we observed a weak but gradual signal recovery at later time points, supporting this view. We have revised the relevant paragraph in the manuscript (line 326-330) to clarify these points and avoid potential misinterpretation.

      Please clarify how the length of the IM is measured and determined in Figure 4H and Figure 5D.

      We thank the reviewer for the vaulable comment. We have now clarified the method for quantifying IM length in the revised manuscript. Specifically, we modified the Statistical Analysis section of the Methods (line 642-643).

      Line 336-342: The description of the results should be clarified. Based on Figure 5H, the authors observed a significant decrease in the mNG-Atg8 signal during photobleaching of the R18 signal.

      We thank the reviewer for pointing out the ambiguity. We have now clarified the description in the revised manuscript. The sentence has been modified (line 360-362) as follows: "To determine whether nutrient replenishment terminates autophagy, we selectively photobleached the R18 signal and monitored the R18 (photobleached) and mNG-Atg8 (without photobleaching) signal following nutrient replenishment."

      The authors photobleached ER-IM MCS and the ER region (boxed region in Figure 5J) and quantified fluorescence recovery, normalized to the IM region and an ER control. The ER control was taken from the other cell. It would be helpful to compare and analyse the fluorescence recovery of R18 in the bleached ER region near the ER-IM MCS to that in the ER-IM MCS. This would help to confirm the ER-IM MCS fluorescence recovery is due to signal coming from the IM.

      We sincerely thank the reviewer for this insightful suggestion. We have now performed the suggested comparison. Interestingly, each sample consistently showed lower fluorescence recovery in the adjacent bleached ER near the ER-IM MCS (mean = 0.20), compared to the ER-IM MCS region (mean = 0.28). To further validate this observation, we also used the IM as a background reference for normalization. This analysis revealed a more significant difference, with the adjacent bleached ER near the ER-IM MCS showing a lower recovery (mean = 0.47) than the ER-IM MCS (mean = 0.80).

      As the reviewer pointed out, these results support our reversible lipid transfer model by demonstrating that fluorescence recovery at the ER-IM MCS is due to the signal coming from the IM, rather than from the adjacent bleached ER, which recovers more slowly and less efficiently. We have incorporated this new analysis into Figure 5, and accordingly revised the figure legend and main text (lines 384-396). Again, we appreciate this constructive and helpful suggestion.

      In figure 5K, the autophagic structure or IM labelled by R18 seems to be maintained when the mNG-Atg8 signal decreases or dissociates from the IM. Could the authors comment on that how they interpret the termination of the prolonged IM structure and IM shrinkage?

      We thank the reviewer for this insightful observation. Based on our live-cell imaging, we speculate that following the initial dissociation of Atg8, the IM membrane undergoes a relatively slow disassembly process, potentially retracting toward the ER-IM MCS, which often localizes near ER exit sites (ERES). This suggests that IM shrinkage may proceed via Atg8-independent mechanisms. Although the precise pathway remains unclear, we occasionally observed vesiculation events during this phase, supporting the idea that membrane remodeling continues even in the absence of Atg8. In response to this comment, we have revised our manuscript to reflect these interpretations (line 494-496).

      The author has shown that Atg2Δ and Atg2LT lipid transfer mutant impair R18 labelling of autophagic structures in Figure 4C. However, the evidence supporting that R18 fluorescence recovery at ER-IM MCS is mediated by reversible Atg2 lipid transfer is not direct. It would be helpful to clarify whether Atg2 stays on the enlarged autophagic membranes when the membrane has reached to its maximum length and no longer grows.

      We thank the reviewer for this important suggestion. As noted in our response to Reviewer 1 (Major Point 8-2), clarifying whether Atg2/Atg18 remains at the ER-IM contact sites after IM expansion is indeed important for supporting the reversible lipid transfer model. We plan to monitor the localization of Atg18 during the nutrient replenishment assay.

      Minor points:

      1. Figure 2A "Dpm-GFP" is missing. The experiment replicates in Figure 2M should be indicated.

      We thank the reviewer for pointing out these issues. The label for "Dpm-GFP" has been added in Figure 2A, and the number of experimental replicates for Figure 2M is now indicated in the figure legend.

      Figure S2, the magenta panel should be "R18".

      We thank the reviewer for catching this labeling error. We have corrected the magenta panel label in Figure S2 to "R18" in the revised version of the figure.

      Line 341-342: "Figure 5H and 5J" should be "Figure 5H and 5I"

      We thank the reviewer for pointing out this error. The citation has been corrected from "Figure 5H and 5J" to "Figure 5H and 5I" in the revised manuscript.

      Please describe how the lipid docking model of Atg2 is generated.

      We thank the reviewer for this question. We have added a description of the modeling approach in the Methods section of the revised manuscript (lines 640-646). We also added the configuration files of AlphaFold3 to the supplementary information.

      Reviewer #2 (Significance (Required)):

      Currently, lipid probes are emerging as powerful tools to understand membrane dynamics, integrity, and the lipid-mediated cellular functions. In this manuscript, the authors performed a detailed characterisation of octadecyl rhodamine B (R18) as a potential lipid probe, which specifically labels ER and autophagic membranes. They present high quality imaging data and performed FRAP experiments to monitor the membrane dynamics and investigate the lipid transfer directionality between the ER and autophagic structure. However, the evidence of Atg2-mediated reversible lipid transfer may not be direct and sufficient. The proposed reversible lipid transfer model is interesting and provides an explanation of lipid level regulation during autophagosome formation.

      We sincerely thank the reviewer for the positive assessment of our work and for acknowledging the potential of R18 as a lipid probe, as well as the quality of our imaging and FRAP experiments. We are particularly grateful that the reviewer found the proposed model of reversible lipid transfer both interesting and relevant to the broader question of lipid regulation during autophagosome formation.

      Regarding the reviewer's concern that the evidence for Atg2-mediated reversible lipid transfer may not be sufficiently direct, we agree this is a critical point. While technical limitations currently prevent direct visualization of lipid flow reversal at single-molecule resolution in vivo, we hope our revision plan strengthen the proposed model and better convey its biological relevance, while also acknowledging the current limitations and the need for further mechanistic work.

      Response to the ____Reviewer #3

      The authors address the question of how autophagic membrane seeds expand into autophagosomes. After nucleation, IMs expand in dependence of the bridge-like lipid transfer protein Atg2, which has been shown to tether the IM to the ER. Several studies have shown in vitro evidence for direct lipid transfer by Atg2 between tethered membranes, and previous evidence has shown that the hydrophobic groove of Atg2 implicated in lipid transfer is required for autophagosome biogenesis in vivo in yeast and mammalian cells.

      In this manuscript, the authors take advantage of the dye R18, which they show accumulates mainly in the ER after a few minutes. They show specifically that the import of R18 into cells and transfer to the ER depends on the activity of flippases in the plasma membrane and OSPB-related lipid transporter. Using different sets of FRAT experiments, the authors track the fluorescence recovery of R18 in the IM, the IM-ER membrane contact site and the neighboring ER. From these experiments the authors conclude that (a) R18 is transferred to IM from the ER when IMs expand and (b) can be transferred from IMs back to the ER when autophagy is deactivated.

      The use of a lipophilic dye to monitor lipid dynamics during IM expansion or dissolution is an elegant way to probe the mechanisms of lipid transfer across ER-IM contact sites. Quantitative in vivo data is critically needed to address this fundamental question in autophagy and contact site biology. However, the study remains limited in providing direct evidence that it is indeed the lipid transfer activity of Atg2, which underlies the R18 dynamics in IMs in vivo.

      We sincerely thank the reviewer for this thoughtful and encouraging summary. We appreciate the recognition of our approach using R18 to visualize lipid dynamics at ER-IM contact sites, and agree that in vivo quantitative data are critically needed to advance our understanding of autophagic membrane expansion.

      We also fully agree with the reviewer that our current study provides indirect-but conceptually informative-support for Atg2-mediated reversible one way lipid transfer. While prior in vitro studies have demonstrated the lipid transfer capability of Atg2, our goal here was to develop a live-cell system that allows the dynamic tracking of lipid flow in vivo, and to explore the possibility of reversible transport during autophagy termination. We hope our story will offer unique insights for future studies aiming to directly probe lipid transfer mechanisms in live cells.

      Regarding the reviewer's concern about the lack of direct evidence that Atg2's lipid transfer activity underlies the observed R18 dynamics, we fully acknowledge this limitation. To address this point, we would like to cite our parallel study currently under revision (Sakai et al., bioRxiv 2025.05.24.655882v1), which provides additional mechanistic evidence linking R18 dynamics to the lipid transfer function of Atg2. Further details and planned revisions are described in the responses below.

      Major points:

      (1) The authors use R18in FRAP experiments to follow its transfer from the ER into IMs. However, whether this transfer is mediated by Atg2 via its inherent lipid transfer activity remains indirect. The only evidence that implicates Atg2 directly is the observation that a lipid transfer deficient Atg2 variant fails to support IM expansion and autophagosome biogenesis. A similar full-length Atg2 mutant has previously been shown to block autophagosome formation in Dabrowski et al. 2023 in yeast, which the authors do not cite or discuss, suggesting the inherent lipid transfer activity of Atg2 is required for IM expansion. However, aside from this experiment, the mechanisms underlying R18 transfer remain unclear and, while they likely depend on or are at least partially mediated by Atg2, they may involve alternative mechanisms including vesicle transport or continuous membrane contacts. Moreover, for the assays with stalled or dissolving IM, it is essential for the authors to test whether Atg2 is still associated with these IMs. It is quite possible that Atg2 dissociates from maximally expanded or dissolving IMs, which would make their interpretation of the data very unlikely. Thus, it will be critical to provide consistent evidence that lipid transfer from the IM to the ER is mediated by Atg2. Ideally, the authors would label IM with BFP-Atg8, R18, and Atg2-GFP and perform their in vivo analysis.

      We sincerely thank the reviewer for the critical comments and valuable suggestions. To further support the link between R18 transfer and Atg2, we would like to highlight two complementary findings. As noted in our response to Reviewer 1 (Major Point 3), R18 can still label the PAS even when Atg2 is recruited but IM expansion is impaired, suggesting that R18 trafficking occurs in an Atg2-dependent manner. In addition, in our parallel study (bioRxiv, 2025.05.24.655882v1), we demonstrated that Atg2 acts as a bridge-like lipid transfer protein. Notably, when we mutated the bridge-forming region of Atg2, R18 transport to the IM was also disrupted.

      We greatly appreciate the reviewer's reminder regarding the study by Dabrowski et al., 2023, which we have now cited and discussed in the revised manuscript (lines 66-68, 312-314). Their findings that the inherent lipid transfer activity of Atg2 is required for autophagosome formation in vivo strongly reinforce our model.

      Regarding the possibility of vesicle transport, we consider this contribution minimal based on R18's preferential labeling of continuous membranes and its divergence from FM4-64 staining. As for the role of continuous membrane contacts, as also mentioned in our response to Reviewer 1, our preliminary tests indicate that R18 still properly labels the ER in tether∆ cells, suggesting that its localization is not due to passive diffusion at membrane contact sites, but rather involves specific transport mechanisms. As this is an initial observation, we plan to confirm the result and include it in a future revision.

      We also thank the reviewer for the suggestion to monitor Atg2 localization at the dissolving IM. As similarly pointed out by two other reviewers, we plan to track Atg18 during the nutrient replenishment assay.

      Finally, we appreciate the idea of triple-labeling with BFP-Atg8, R18, and Atg2-GFP. While our preliminary attempts encountered technical difficulties such as abnormal BFP-Atg8 localization and severe bleaching during long-term imaging in yeast, we plan to optimize this approach in future experiments.

      (2) Given the ER forms contact sites with many organelles using bridge-like lipid transfer proteins, how do the authors explain the preferential accumulation of R18 in ARS and not in for example PM (Fmp27), mitochondria, endosomes or vacuole (Vps13)? Why should R18 specifically transferred by Atg2 and not or to a much lower rate by Fmp27 or Vps13?

      We sincerely thank the reviewer for raising this insightful question. Indeed, we have carefully considered this point. Our data indicate that R18 labeling of autophagy-related structures (ARS) depends on Atg2, as demonstrated in the present manuscript and supported by our parallel study currently under revision (bioRxiv, 2025.05.24.655882v1).

      We speculate that the preferential accumulation of R18 in ARS may arise from structural and contextual differences among bridge-like LTPs, such as Atg2, Vps13, and Fmp27. Although all are capable of mediating lipid transfer, these proteins differ in their membrane tethering modes, cargo specificity, and spatial regulation. For example, Atg2 localizes specifically to ER-IM contact sites during autophagosome formation, where membrane expansion requires rapid lipid supply. In contrast, Vps13 and Fmp27 may function at more stable or less dynamic contacts, where lipid turnover or probe accessibility is more limited. We have added a brief discussion of this point in the revised manuscript to reflect this important consideration (lines 439-444).

      (3) Does R18 label autophagic bodies after they are formed. Could the authors add R18 after autophagic bodies have formed in atg15 or pep4 cells?

      We thank the reviewer for this excellent suggestion. To address whether R18 can label autophagic bodies post-formation, we plan to perform additional experiments by adding R18 after autophagic bodies have accumulated in atg15Δ or pep4Δ cells. This will help clarify whether R18 incorporates into pre-formed autophagic bodies or requires earlier membrane dynamics for its labeling.

      (4) Since Neo1- or OSBP-defective cells do not transfer R18 from the PM to the ER or other membranes, the authors should include these strains as controls for ER-dependent R18 transfer to ARSs.

      We thank the reviewer for this insightful suggestion. To further validate the ER-dependency of R18 transfer to autophagy-related structures, we plan to include Neo1- and OSBP-deficient strains as additional controls.

      Comments:

      The authors neglect to mention or discuss important recent literature directly related to their study:

      Schutter et al., Cell (2020); Orii et al., JCB (2021); Polyansky et al., EMBOJ (2022); Dabrowski et al., JCB (2023); Shatz et al., Dev Cell (2024)

      We sincerely thank the reviewer for pointing out these important and highly relevant studies. We apologize for our oversight in not citing them earlier. Each of these works has provided valuable insights that are directly related to and have greatly informed our current study. We have now cited and discussed these references in appropriate sections of the revised manuscript.

      Figure 1A and B: The authors need to describe how these cells were stained with R18 in the figure legend or text to help the reader to understand how these experiments were performed. Figure legends need to indicate at which time point after rapamycin treatment cells were analyzed.

      Thank you for the helpful suggestion. We have now added the corresponding information to the figure legends to clarify the staining procedure and time points.

      The authors need to clarify whether mNG-Atg8 colocalization with R18 was included for dot- and ring-like structures for WT cells as shown separately in 1A but not in 1B.

      Thank you for the comment. The quantification in Figure 1B includes both dot- and ring-like structures of mNG-Atg8 colocalized with R18 in WT cells, as shown in Figure 1A. We have now clarified this point in the revised figure legend.

      Figure 1C: The figure legend needs to describe the conditions cells were treated with and when cells were analyzed after rapamycin treatment (presumably).

      Thank you for the helpful suggestion. We have now added the corresponding information to the figure legends.

      Figure 1C: The authors should combine atg15 and pep4 deletions with atg2 or atg7 as controls in which autophagic bodies are not formed.

      Thank you for the valuable suggestion. We plan to perform these experiments that combine atg15 and pep4 deletions with atg2 or atg7 as controls.

      Figure 1E and F: R18 stains more than just the ER in the cells shown. In addition to atg39 and atg40, authors should include atg11 to inhibit all forms of selective autophagy.

      Thank you very much for the insightful comment. We agree and plan to include the atg11Δ mutant to inhibit all forms of selective autophagy.

      Figure S2A and B: The figures are mislabeled. Instead of FM4-64 it should say R18. In addition to the ER, in several images it is obvious to see R18 staining the vacuole membrane (for example Figure 2A 30 degrees) and others. Thus, the strong thresholding in S2 may give the reader an oversimplified view on R18 localization. This needs to be corrected.

      Thank you very much for pointing this out. We have corrected the labeling error in Figure S2A and B. Regarding the observation that R18 occasionally labels the vacuole membrane, we agree with the reviewer's comment. Based on our data, we believe that this signal likely reflects autophagosomes that have reached and fused with the vacuole, as expected in the later stages of autophagy. We have clarified this point in the text to avoid oversimplification of R18 localization (lines 169-171, 426-428).

      Figure 1G and H: In 1G, there are number of R18-stained patches not co-labeled by GFP-ER. What are these patches and which organelles to they represent? In 1H, given the tight association of the ER (omegasome) with forming IMs, it is difficult to discern whether R18 labels surrounding ER membrane or the IM itself. This needs to be more closely analyzed. The authors need to quantify these data similar to the yeast data.

      Thank you for the suggestion. We plan to perform additional quantification and colocalization analysis to clarify the identity of R18-positive signals in 1G and 1H.

      Figure 4A-C: A full-length PLT-deficient variant of Atg2 has been analyzed by Dabrowski et al, JCB 2023 in vivo. This work needs to be cited and discussed. The analysis needs to include punctate Atg8 structures for WT cells to exclude effects due to expansion defects.

      Thank you for the suggestion. We have now cited and discussed the work by Dabrowski et al., JCB 2023 in the revised manuscript (lines 67-68, 312-314). In addition, we have included an analysis of punctate Atg8 structures in WT cells to address the concern regarding potential expansion defects.

      Figure 4F-H: To measure the size changes in IMs, the authors would need to perform these experiments without bleaching the mNG-Atg8 signals.

      We apologize for the lack of clarity. The method for measuring IM size has now been added to the revised manuscript. In Figure 4, we note that mNG-Atg8 fluorescence actually shows a slow recovery over time. This limited recovery likely reflects both the slower turnover of Atg8 and the fact that the pre-existing Atg8 pool at the IM was partially photobleached. We have now revised the main text to clarify this point and included additional explanation (line 326-330).

      Figure 5C: The authors need to indicate the bleached areas in the mNG-Atg8 image for easier orientation. It looks to me that the area that the authors mark as IM-ER MCS is really the IM in proximity to the ER. Thus, if lipid transfer to the IM has ceased, I would not expect recovery here. If the IM-ER MCS area includes IM and the ER to similar extent, I would expect exactly what the authors show: IM does not recover while ER quickly recovers. On average, we would observe reduced recovery as shown in 5D.

      Thank you for the helpful suggestion, and we apologize for the oversight during figure preparation. We have now clearly indicated the bleached areas in the merged image in Figure 5C for better orientation. Additionally, we have carefully re-examined the defined ER-IM MCS region and confirm that the quantified area indeed corresponds to the contact site between the ER and the IM. And double checked the measurements shown in the figure remain correct.

      Figure 5L: Since mNG-Atg8 signal homogenously disappears from the IM, it is meaningless to measure size. How do the authors measure the size of something they cannot detect?

      Thank you for pointing this out. We agree with the reviewer's comment and have removed the panel from the revised version accordingly.

      Figure 5K: The authors need to show the whole bleached area overtime for the reader to be able to see where the recovered R18 signal might be coming from. Currently, it is impossible to discern whether the signal comes from the IM or from slow recovery from neighboring ER.

      We appreciate this insightful comment. To address the concern and following the suggestion from Reviewer 2 (Major Point No.4), we have now revised the figure to include an additional measurement of fluorescence recovery in the adjacent bleached ER (Figure 5K and 5M) (lines 384-396). These results further support our reversible lipid transfer model by demonstrating that fluorescence recovery at the ER-IM MCS originates from the IM, rather than from the adjacent bleached ER, which shows slower and less efficient recovery.

      We have also added time-lapse videos to the supplementary information due to space limitations in the main figure.

      Reviewer #3 (Significance (Required)):

      The use of a lipophilic dye to monitor lipid dynamics during IM expansion or dissolution is an elegant way to probe the mechanisms of lipid transfer across ER-IM contact sites. Quantitative in vivo data is critically needed to address this fundamental question in autophagy and contact site biology. However, the study remains limited in providing direct evidence that it is indeed the lipid transfer activity of Atg2, which underlies the R18 dynamics in IMs in vivo.

      We sincerely thank the reviewer for this encouraging and thoughtful comment. We appreciate the recognition that our live-cell approach using a lipophilic dye provides a valuable framework to visualize lipid dynamics during autophagosome biogenesis. As the reviewer pointed out, quantitative in vivo evidence is critically needed in this field, and we hope our study contributes meaningfully toward that goal.

      We also fully acknowledge the limitation. While our current data offer indirect evidence for Atg2-mediated lipid transfer, we would like to support this by our revision plan and also our parallel study (bioRxiv, 2025.05.24.655882v1) that shows Atg2 is indeed a bridge-like LTP and R18 transfer is lost in the bridge-structure defective strain. Together, we hope these can suggest that the lipid transfer activity of Atg2 underlies the observed R18 dynamics in vivo.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Recent work has demonstrated that the hummingbird hawkmoth, Macroglossum stellatarum, like many other flying insects, use ventrolateral optic flow cues for flight control. However, unlike other flying insects, the same stimulus presented in the dorsal visual field elicits a directional response. Bigge et al., use behavioral flight experiments to set these two pathways in conflict in order to understand whether these two pathways (ventrolateral and dorsal) work together to direct flight and if so, how. The authors characterize the visual environment (the amount of contrast and translational optic flow) of the hawkmoth and find that different regions of the visual field are matched to relevant visual cues in their natural environment and that the integration of the two pathways reflects a priortiziation for generating behavior that supports hawkmoth safety rather than than the prevalence for a particular visual cue that is more prevalent in the environment.

      Strengths:

      This study creatively utilizes previous findings that the hawkmoth partitions their visual field as a way to examine parallel processing. The behavioral assay is well-established and the authors take the extra steps to characterize the visual ecology of the hawkmoth habitat to draw exciting conclusions about the hierarchy of each pathway as it contributes to flight control.

      Weaknesses:

      The work would be further clarified and strengthened by additional explanation included in the main text, figure legends, and methods that would permit the reader to draw their own conclusions more feasibly. It would be helpful to have all figure panels referenced in the text and referenced in order, as they are currently not. In addition, it seems that sometimes the incorrect figure panel is referenced in the text, Figure S2 is mislabeled with D-E instead of A-C and Table S1 is not referenced in the main text at all. Table S1 is extremely important for understanding the figures in the main text and eliminating acronyms here would support reader comprehension, especially as there is no legend provided for Table S1. For example, a reader that does not specialize in vision may not know that OF stands for optic flow. Further detail in figure legends would also support the reader in drawing their own conclusions. For example, dashed red lines in Figures 3 and 4 A and B are not described and the letters representing statistical significance could be further explained either in the figure legend or materials to help the reader draw their own conclusions.

      We appreciate the suggestions to improve the clarity of the manuscript. We have extensively re-structured the entire manuscript. Among others, we have referenced all figure panels in the text in the order they appear. To do so, we combined the optic flow and contrast measurements of our setup with the methods description of the behavioural experiments (formerly Figs. 5 and 2, respectively). This new figure 2 now introduces the methods of the study, while the remainder of Fig. 2, which presented the experiments that investigated the vetrolateral and dorsal response in more detail, is now a separate figure (Fig. 3). This arrangement also balances the amount of information contained  in each figure better.

      Reviewer #2 (Public review):

      Summary:

      Bigge and colleagues use a sophisticated free-flight setup to study visuo-motor responses elicited in different parts of the visual field in the hummingbird hawkmoth. Hawkmoths have been previously shown to rely on translational optic flow information for flight control exclusively in the ventral and lateral parts of their visual field. Dorsally presented patterns, elicit a formerly completely unknown response - instead of using dorsal patterns to maintain straight flight paths, hawkmoths fly, more often, in a direction aligned with the main axis of the pattern presented (Bigge et al, 2021). Here, the authors go further and put ventral/lateral and dorsal visual cues into conflict. They found that the different visuomotor pathways act in parallel, and they identified a 'hierarchy': the avoidance of dorsal patterns had the strongest weight and optic flow-based speed regulation the lowest weight.

      Strengths:

      The data are very interesting, unique, and compelling. The manuscript provides a thorough analysis of free-flight behavior in a non-model organism that is extremely interesting for comparative reasons (and on its own). These data are both difficult to obtain and very valuable to the field.

      Weaknesses:

      While the present manuscript clearly goes beyond Bigge et al, 2021, the advance could have perhaps been even stronger with a more fine-grained investigation of the visual responses in the dorsal visual field. Do hawkmoths, for example, show optomotor responses to rotational optic flow in the dorsal visual field?

      We thank the reviewer for the feedback, and the suggestions for improvement of the manuscript (our implementations are detailed below). We fully agree that this study raises several intriguing questions regarding the dorsal visual response, including how the animals perceive and respond to rotational optic flow in their dorsal visual field, particularly since rotational optic flow may be processed separately from translational optic flow.

      In our free-flight setup, it was not possible to generate rotational optic flow in a controlled manner. To explore this aspect more systematically, a tethered-flight setup would be ideal, or alternatively, a free-flight setup integrated with virtual reality. This would be a compelling direction for a follow-up study.

      Reviewer #3 (Public review):

      The central goal of this paper as I understand it is to extract the "integration hierarchy" of stimulus in the dorsal and ventrolateral visual fields. The segregation of these responses is different from what is thought to occur in bees and flies and was established in the authors' prior work. Showing how the stimuli combine and are prioritized goes beyond the authors' prior conclusions that separated the response into two visual regions. The data presented do indeed support the hierarchy reported in Figure 5 and that is a nice summary of the authors' work. The moths respond to combinations of dorsal and lateral cues in a mixed way but also seem to strongly prioritize avoiding dorsal optic flow which the authors interpret as a closed and potentially dangerous ecological context for these animals. The authors use clever combinations of stimuli to put cues into conflict to reveal the response hierarchy.

      My most significant concern is that this hierarchy of stimulus responses might be limited to the specific parameters chosen in this study. Presumably, there are parameters of these stimuli that modulate the response (spatial frequency, different amounts of optic flow, contrast, color, etc). While I agree that the hierarchy in Figure 5 is consistent for the particular stimuli given, this may not extend to other parameter combinations of the same cues. For example, as the contrast of the dorsal stimuli is reduced, the inequality may shift. This does not preclude the authors' conclusions but it does mean that they may not generalize, even within this species. For example, other cue conflict studies have quantified the responses to ranges of the parameters (e.g. frequency) and shown that one cue might be prioritized or up-weighted in one frequency band but not in others. I could imagine ecological signatures of dorsal clutter and translational positioning cues could depend on the dynamic range of the optic flow, or even having spatial-temporal frequency-dependent integration independent of net optic flow.

      We absolutely agree that in principle, an observed integration hierarchy is only valid for the stimuli tested. Yet, we do believe that we provide good evidence that our key observations are robust also for related stimuli to the ones tested:

      Most importantly, we found that both pathways act in parallel (and are not mutually exclusive, or winner-takes-all, for example), when the animals can enact the locomotion induced by the dorsal and ventrolateral pathway. We tested this with the same dorsal cue (the line switching direction), but different behavioural paradigms (centring vs unilateral avoidance), and different ventrolateral stimuli (red gratings of one spatial frequency, and 100% nominal contrast black-and-white checkerboard stimuli which comprised a range of spatial frequencies) – and found the same integration strategy.

      Certainly, if the contrast of the visual cues was reduced to the point that the dorsal or ventrolateral responses became weaker, we would expect this to be visible in the combined responses, with the respective reduction in response strength for either pathway, to the same degree as they would be reduced when stimuli were shown independently in the dorsal and ventrolateral visual field.

      For testing whether the animals would show a weighting of responses when it was not possible to enact locomotion to both pathways, we felt it was important to use similar external stimuli to be able to compare the responses. So we can confidently interpret their responses in terms of integration. Indeed, how this is translated to responses in the two pathways depends a) on the spatiotemporal tuning, contrast sensitivity and exact receptive fields of the two systems, b) the geometry of the setup and stimulus coverage, and therefore the ability of the animals to enact responses to both pathways independently and c) on the integration weights.

      It would indeed be fascinating to obtain this tuning and the receptive fields, and having these, test a large array of combinations of stimuli and presentation geometries, so that one could extract integration weights for different presentation scenarios from the resulting flight responses in a future study.

      We also expanded the respective discussion section to reflect these points: l. 391-417. We also updated the former Fig. 5, now Fig. 6 to reflect this discussion.

      The second part of this concern is that there seems to be a missed opportunity to quantify the integration, especially when the optic flow magnitude is already calculated. The discussion even highlights that an advantage of the conflict paradigm is that the weights of the integration hierarchy can be compared. But these weights, which I would interpret as stimulus-responses gains, are not reported. What is the ratio of moth response to optic flow in the different regions? When the moth balances responses in the dorsal and ventrolateral region, is it a simple weighted average of the two? When it prioritizes one over the other is the response gain unchanged? This plays into the first concern because such gain responses could strongly depend on the specific stimulus parameters rather than being constant.

      Indeed, we set up stimuli that are comparable, as they are all in the visual domain, and since we can calculate their external optic flow and contrast magnitudes, to control for imbalances in stimulus presentation, which is important for the interpretation of the resulting data.

      As we discussed above, we are confident that we are observing general principles of the integration of the two parallel pathways. However, we refrained from calculating integration weights, because these might be misleading for several reasons:

      (1) In situations where the animals can enact responses to both pathways, we show that they do so at the full original magnitudes. So there are no “weights” of the hierarchy in this case.

      (2) Only when responses to both systems are not possible in parallel, do we see a hierarchy. However, combined with point (1), this hierarchy likely depends on the geometry of the moths’ environment: it will be more pronounced the less both systems can be enacted in parallel.

      (3) The hierarchy also does not affect all features of the dorsal or ventrolateral pathway equally. The hawkmoths still regulate their perpendicular distance to ventral gratings with dorsal gratings present, to same degree as with only ventral grating - because perpendicular distance regulation is not a feature of the dorsal response. And while the hawkmoths show a significant reduction in their position adjustment to dorsal contrast when it is in conflict with lateral gratings (Fig. 4C), they show exactly the same amount of lateral movement and speed adjustment as for dorsal gratings alone, when not combined with lateral ones (Fig. 4D and Fig. S3A). So even for one particular setup geometry and stimulus combination, there clearly is not one integration weight for all features of the responses.

      We extended the discussion section to clarify these points “The benefit of our study system is that the same cues activate different control pathways in different regions of the visual field, so that the resulting behaviour can directly be interpreted in terms of integration weights” (l. 448-451)

      l. 391-417, we also updated the former Fig. 5, now Fig. 6 to reflect this discussion.

      The authors do explain the choice of specific stimuli in the context of their very nice natural scene analysis in Fig. 1 and there is an excellent discussion of the ecological context for the behaviors. However, I struggled to directly map the results from the natural scenes to the conclusions of the paper. How do they directly inform the methods and conclusions for the laboratory experiments? Most important is the discussion in the middle paragraph of page 12, which suggests a relationship with Figure 1B, but seems provocative but lacking a quantification with respect to the laboratory stimuli.

      We show that contrast cues and translational optic flow are not homogeneously distributed in the natural environments of hawkmoths. This directly related to our laboratory findings, when it comes to responses to these stimuli in different parts of their visual field. In order to interpret the results of these behavioural experiments with respect to the visual stimuli, we did perform measurements of translational optic flow and contrast cues in the laboratory setup. As a result, we make several predictions about the animals’ use of translational optic flow and contrast cues in natural settings:

      a) Hawkmoths in the lab responded strongest to ventral optic flow, even though it was not stronger in magnitude, given our measurements, than lateral optic flow. Thus, we propose that the stronger response to ventral optic flow might be an evolutionary adaptation to the natural distribution of translational optic flow cues.

      b) In the natural habitats of hawkmoths, dorsal coverage is much less frequent that ventrolateral structures generating translational optic flow, yet the hawkmoths responded with a much higher weight to the former. Moreover, in our flight tunnel experiments, the animals responded with the same or higher weights to dorsal cues, which had a lower magnitude of translational optic flow and contrast than the same cues in the ventrolateral visual field. So we showed, combining behavioural experiments and stimulus measurements in the lab that the weighting of dorsal and ventrolateral cues did not follow their stimulus magnitude in the lab. Moreover, comparing to the natural cue distributions, we suggest that the integration weights also did not evolve to match the prevalence of these cues in natural habitats.

      We integrated the measurements of natural visual scene statistics in the new Fig. 6, to relate the behavioural findings to the natural context also in the figure structure, and sequence logic of the text, as they are discussed here.

      The central conclusion of the first section of the results is that there are likely two different pathways mediating the dorsal and the ventrolateral response. This seems reasonable given the data, however, this was also the message that I got from the authors' prior paper (ref 11). There are certainly more comparisons being done here than in that paper and it is perfectly reasonable to reinforce the conclusion from that study but I think what is new about these results needs to be highlighted in this section and differentiated from prior results. Perhaps one way to help would be to be more explicit with the open hypotheses that remain from that prior paper.

      We appreciate the suggestion to highlight more clearly what the open questions that are addressed in this study are. As a result, we have entirely restructured the introduction, added sections to the discussion and fundamentally changed the graphical result summary in Fig. 6, to reflect the following new findings (and differences to the previous paper):

      The previous paper demonstrated that there are two different pathways in hummingbird hawkmoths that mediate visual flight guidance, and newly described one of them, the dorsal response. This established flight guidance in hummingbird hawkmoths as a model for the questions asked in the current study, which are very different in nature from the previous paper.  

      The main question addressed in the current study is how these two flight guidance pathways interact to generate consistent behaviour? Throughout the literature of parallel sensory and motor pathways guiding behaviour, there are different solutions – from winner-takes-all to equal mixed responses. We tested this fundamental question using the hummingbird hawkmoth flight guidance systems as a model.

      This is the main question addressed in the various conflict experiments in this study, and we show that indeed, the two systems operate in parallel. As long as the animals can enact both dorsal and optic-flow responses, they do so at the original strengths of the responses. Only when this is not possible, hierarchies become visible. We carefully measured the optic flow and contrast cues generated by the different stimuli to ensure that the hierarchies we observed were not generated by imbalances of the external stimuli.

      - Does the interaction hierarchy of the two pathways follow the statistics of natural environments?  We did show qualitatively previously how optic flow and contrast cues are distributed across the visual field in natural habitats of the hummingbird hawkmoth. In this study, we quantitatively analysed the natural image data, including a new analysis for the contrast edges, and statistically compared the results across conditions. This quantitative analysis supported the previous qualitative assessment that the prevalence of translational optic flow was highest in the ventral and lowest in the dorsal visual field in all natural habitat types. The distribution of contrast edges across the visual field did depend on habitat type much stronger than visible in the qualitative analysis in the previous paper. When compared to the magnitude of the behavioural responses, and considering that the hummingbird hawkmoth is predominantly found in open and semi-open habitats, the natural distributions of optic flow and contrast edges did not align with the response hierarchy observed in our laboratory experiments. Dorsal cues elicited much stronger responses relative to ventrolateral optic flow responses than would be expected.

      To provide a more complete picture of the dorsal pathway, which will be important to understand its nature, and also compare to other species, we conducted additional experiments that were specifically set up to test for response features known from the translational optic flow response. To compare and contrast the two systems. These experiments here allowed us to show that the dorsal response is not simply a translational optic flow reduction response that creates much stronger output than the ventrolateral optic flow response. We particularly show that the dorsal response was lacking the perpendicular distance regulation of the optic flow response, while it did provide alignment with prominent contrasts (possibly to reduce the perceived translational optic flow), which is not observed in the ventrolateral optic flow response. The strong avoidance of any dorsal contrast cues, not just those inducing translational optic flow, is another feature not found in the ventrolateral pathway.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Many comparisons between visual conditions are made and it was confusing at times to know which conditions the authors were comparing. Thinking of a way to label each condition with a letter or number so that the authors could specify which conditions are specifically being compared would greatly enhance comprehension and readability.

      We appreciate this concern. To be able to refer to the individual stimulus conditions in the analysis and results description, we gave each stimulus a unique identifier (see table S1), and provided these identifiers in the respective figures and throughout the text. We hope that this makes the identification of the individual stimuli easier.

      Consider adding in descriptive words to the y-axis labels for the position graphs that would help the reader quickly understand what a positive or negative value means with respect to the visual condition.

      We did now change the viewpoint on the example tracks in Figs. 2-5, to take a virtual viewpoint from the top, not as the camera recorded from below, which requires some mental rotation to reconcile the left and right sides. Moreover, we noticed that the example track axes were labelled in mm, while the axes for the plots showing median position in the tunnel were labelled in cm. We reconciled the units as well. This will make it easier to see the direct equivalent of the axis (as well as positive and negative values) in the example tracks in those figures, and the median positions, as well as the cross-index.

      There are no line numbers provided so it is a bit challenging to provide feedback on specific sentences but there are a handful of typos in the manuscript, a few examples:

      (1) Cue conflict section, first paragraph: "When both cues were presented to in combination, ..." (remove to)

      (2) The ecological relevance section, first paragraph, first sentence: "would is not to fly"

      (3) Figure S3 legend: explanation for C is labeled as B and B is not included with A

      We apologise for the missing line numbers. We added these and resolved the issues 1-3.

      Reviewer #2 (Recommendations for the authors):

      - The pictograms in Fig. 1a were at first glance not clear to me, maybe adding l, r, d, v to the first pictogram could make the figure more immediately accessible.

      We added these labels to make it more accessible.

      - I would suggest noting in the main text that the red patterns were chosen for technical reasons (see Methods), if this is correct.

      We added this information and a reference to the methods in the main text (lines 100-102).

      - "Thus, hawkmoths are currently the only insect species for which a partitioning of the visual field has been demonstrated in terms of optic-flow-based flight control [33-35]." I think that is a bit too strong and maybe it would be more interesting to connect the current data to connected data in other insects to perhaps discuss important similarities. Ref 32 for example shows that fruit flies weigh ventral translational optic flow considerably more than dorsal translational optic flow. Reichardt 1983 (Naturwissenschaften) showed that stripe fixation in large flies (a behaviour relying in part on the motion pathway) is confined to the ventral visual field, etc...

      We have changed this sentence to acknowledge partitioning in other insects, and motivating the use of our model species for this study: While fruit flies weight ventral translational optic flow stronger than dorsal optic flow, the most extreme partitioning of the visual field in terms of  optic-flow-based flight control has been observed in hawkmoths [33-35]. (lines 60-62)

      - I think the statistical differences group mean differences could be described in more detail at least in Fig. 2 (to me the description was not immediately clear, in particular with the double letters).

      We added an explanation of the letter nomenclature to all respective figure legends:

      Black letters show statistically significant differences in group means or median, depending on the normality of the test residuals (see Methods, confidence level: 5%). The red letters represent statistically significant differences in group variance from pairwise Brown–Forsythe tests (significance level 5%). Conditions with different letters were significantly different from each other. The white boxplots depict the median and 25% to 75% range, the whiskers represent the data exceeding the box by more than 1.5 interquartile ranges, and the violin plots indicate the distribution of the individual data points shown in black.

      - "When translational optic flow was presented laterally" I would use a more wordy description, since it is the hawkmoth that is controlling the optic flow and in addition to translational optic flow, there might also be rotational components, retinal expansion etc.

      We extended the description to explain that the moths were generating the optic flow percept based on stationary gratings in different orientations, by way of their flight through the tunnel. Lines 127-129

      - While it is clearly stated that the measure of the perpendicular distance from the ventral and dorsal pattern via the size of the insect as seen by the camera is indirect, I would suggest to determine the measurement uncertainty of distance estimate.

      - Connected to above - is the hawkmoth area averaged over the entire flight and is the variance across frames similar in all the stimuli conditions? Is it, in principle, conceivable that the hawkmoths' pitch (up or down) is different across conditions, e.g. with moths rising and falling more frequently in a certain condition, which could influence the area in addition to distance?

      There are a number of sources that generate variance in the distance estimate (which was based on the size of the moth in each video frame, after background subtraction): the size of the animal, the contrast with which the animal was filmed (which also depended on the type of pattern in the tunnel – it was lower with ventral or dorsal patterns as a background than with lateral ones), and the speed of the animal, as motion blur could impact the moth’s image on the video. The latter is hard to calibrate, but the uncertainty related to animal size and pattern types could theoretically be estimated. However, since we moved between finishing the data acquisition for this study and publishing the paper, the original setup has been dismantled. We could attempt to recreate it as faithfully as possible, but would be worried to introduce further noise. We therefore decided to not attempt to characterise the uncertainty, to not give a false impression of quantifiability of this measure. For the purpose of this study, it will have to remain a qualitative, rather than a quantitative measure. If we should use a similar measure again, we will make sure to quantify all sources of uncertainty that we have access to.

      The variance in area is different between conditions. Most likely, the animals vary their flight height different for different dorsal and ventral patterns, as they vary their lateral flight straightness with different lateral visual input. For the reasons mentioned above, we cannot disentangle the effects of variations in flight height and other sources of uncertainty relating to animal size in the video frames. We therefore averaged the extracted area across the entire flight, to obtain a coarse measure of their flight height. Future studies focusing specifically on the vertical component or filming in 3D will be required to determine the exact amount of vertical flight variation.

      - Results second paragraph, suggestion: pattern wavelength or spatial frequency instead of spatial resolution.

      - Same paragraph, suggestion: For an optimal wavelength/spatial frequency of XX

      We corrected these to spatial frequency.

      - Above Fig 3- "this strongly suggests a different visual pathway". In my opinion it would be better to say sensory-motor /visuomotor pathway or to more clearly define visual pathway? Could one in principle imagine a uniform set of local motion sensitive neurons across the entire visual field that connect differentially to descending/motor neurons.

      We appreciate this point and changed this, and further instances in the manuscript to visuomotor pathway.

      - If I understood correctly, you calculated the magnitude of optic flow in the different tunnel conditions based on the image of a fisheye camera moving centrally in the tunnel, equidistant from all walls. I did not understand why the magnitude of optic flow should differ between the four quadrants showing the same squarewave patterns. Apologies if I missed something, but maybe it is worth explaining this in more detail in the manuscript.

      We recognize that this point may not have been immediately clear and have therefore provided additional clarification in the Methods and results section (lines 106-111, 543-549). We anticipated differences in the magnitude of optic flow due to potential contrast variations arising from the way the stimuli were generated—being mounted on the inner surfaces of different tunnel walls while the light source was positioned above. On the dorsal wall, light from the overhead lamps passed through the red material. For laterally mounted patterns, the animals perceived mainly reflected light, as these tunnel walls were not transparent.

      A similar principle applied to the background, which consisted of a white diffuser allowing light to pass through dorsally, but white non-transmissive paper laterally, with a 5% contrast random checkerboard patterns. The ventral side presented a more complex scenario, as it needed to be partially transparent for the ventrally mounted camera. Consequently, the animals perceived a combination of light reflections from the red patterns and the white gauze covering the ventral tunnel side, against the much darker background of the surrounding room.

      To ensure that the observed flight responses were not artifacts of deviations in visual stimulation from an ideal homogeneous environment, we used the camera to quantify the magnitude of optic flow and contrast patterns under these real experimental conditions. This approach also allowed us to directly relate the optic flow measurements taken indoors to those recorded outdoors, as we employed the same camera and analytical procedures for both datasets.

      Reviewer #3 (Recommendations for the authors):

      In addition to the considerations above I had a few minor points:

      There are so many different directions of stimuli and response that it is quite challenging to parse the results. Can this be made a little easier for the reader?

      We appreciate this concern. To be able to refer to the individual stimulus conditions in the analysis and results description, we gave each stimulus a unique identifier (see table S1), and provided these identifiers in the respective figures and throughout the text. We hope that this makes the identification of the individual stimuli easier.

      One suggestion (only a suggestion): I found myself continuously rotating the violin plots in my head so that the lateral position axis lined up with the lateral position of the tunnel icons below. Consider if rotating the plots 90 degs would help interpretability. It was challenging to keep track of which side was side.

      We did discuss this with a number of test-readers, and tried multiple configurations. They all have advantages and drawbacks, but we decided that the current configuration for the majority of testers was the current one. To help the mental transformations from the example flight tracks in the figures, we now present the example flight tracks in Figs. 2-5 in the same reference frame as the figures showing median position (so positive and negative values on those axes correspond directly), and changed the view from a below the tunnel to an above the tunnel view, as this is the more typical depiction. We hope that this enhances readability.

      Are height measurements sensitive to the roll and pitch of the animal? I suspect this is likely small but worth acknowledging.

      They are indeed. These effects are likely small but contribute to the overall inaccuracy, which we could not quantify in this particular setup (see also response to reviewer 2 on that point), which is why the height measurements have to be considered a qualitative approximation rather than a quantification of flight height. We added text to acknowledge the effects of roll and pitch specifically (lines 657-658)

      The Brown-Forsythe test was reported as paired but this seems odd because the same moths were not used in each condition. Maybe the authors meant something different by "paired" than a paired statistical design?

      Indeed, the data was not paired in the sense that we could attribute individual datapoints to individual moths across conditions. We applied the Brown-Forsythe test in a pairwise manner, comparing the variance of each condition with another one in pairs each, to test if the variance in position differed across conditions. We did phrase this misleadingly, and have corrected it to „The variance in the median lateral position (in other words, the spread of the median flight position) was statistically compared between the groups using the pairwise Brown–Forsythe tests“ l. 187-188

      There is some concern about individual moth preferences and bias due to repeated measures. I appreciate that the individual moth's identity was not likely known in most cases, but can the authors provide an approximate breakdown of how many individual moths provided the N sample trajectories?

      This is a very valid concern, and indeed one we did investigate in a previous study with this setup. We confirmed that the majority of animals (70%, 68% and 53% out of 40 hawkmoths, measured on three consecutive days) crossed the tunnel within a randomly picked window of 3h (Stöckl et al. 2019). We now state this explicitly in the methods section (lines 594-597). Thus, for the sample sizes in our study, statistically, each moth would have contributed a small number of tracks compared to the overall number of tracks sampled.

      The statistics section of the methods said that both Tukey-Kramer (post-hoc corrected means) and Kruskal-Wallis (non-parametric medians) were done. It is sometimes not clear which test was done for which figure, and where the Kruskal-Wallis test was done there does not seem to be a corrected statistical significance threshold for the many multiple comparisons (Fig. 2). It is quite possible I am just missing the details and they need to be clarified. I think there also needs to be a correction for the Brown-Forsythe tests but I don't know this method well.

      We first performed an ANOVA, and if the test residuals were not normally distributed, we used a Kruskal-Wallis test instead. For the post-hoc tests of both we used Tukey-Kramer to correct for multiple comparisons. The figure legends did indeed miss this information. We added it to clarify our statistical analysis strategy and refer to the methods section for more details (i.e. l. 185-186). All statistical results, including the type of statistical test used, have been uploaded to the data repository as well.

      The connection to stimulus reliability in the discussion seems to conflate reliability with prevalence or magnitude.

      We have rephrased the respective discussion sections to clearly separate the prevalence and magnitude of stimuli, which was measured, from an implied or hypothesized reliability (lines 510-511).

      Line numbers would be helpful for future review.

      We apologize for missing the line numbers and have added them to the revised manuscript.

    1. y aristotle aristotle This moment reflects the complex changing attitudes towards the capabilities of women during the Enlightenment. Unca Eliza both establishes her womanhood as something inferior to male intellectuals and simultaneously places herself above classical scholars. While women at the time were becoming increasingly established in intellectual fields they were still seen as inferior to men. In Women and Enlightenment in Eighteenth Century Britain (Cambridge University Press, 2009), Karen O'Brien writes that, while the progress of women in eighteenth-century Britain cannot be likened to feminism as we know it, the Enlightenment "created a framework and a language for understanding the gendered structures of society without which nineteenth-century feminism would not have been possible" (2). - [UOStudStaff], w

      extra spaces need deleting