10,000 Matching Annotations
  1. Feb 2026
    1. Cadre de référence sur les mesures de contrôle en milieu scolaire : Note de synthèse

      https://www.youtube.com/watch?v=D43t0L_G7-Y

      Résumé exécutif

      Ce document de référence, fruit d'une collaboration entre le ministère de l’Éducation (MEQ) et la Fédération des centres de services scolaires du Québec (FCSSQ), définit les orientations nationales concernant l’utilisation des mesures de contrôle — contention et isolement — dans les établissements d'enseignement.

      La prémisse fondamentale est que ces mesures ne doivent être envisagées qu'en dernier recours, exclusivement dans des situations d'urgence où la sécurité de l'élève ou d'autrui est menacée de façon imminente.

      Le cadre privilégie une approche préventive et éducative, structurée autour du Système de soutien à paliers multiples (SSPM), visant à réduire au minimum le recours à la force ou à la contrainte.

      Il clarifie les responsabilités légales et professionnelles, notamment depuis les modifications réglementaires d'octobre 2023 habilitant certains professionnels (psychologues et psychoéducateurs) à décider de l’utilisation de mesures de contention.

      La mise en œuvre repose sur une démarche rigoureuse en cinq étapes, incluant l'élaboration de protocoles spécifiques (école ou élève) et l'application de modalités postsituationnelles pour assurer le bien-être et la réévaluation constante des pratiques.

      1. Fondements et principes directeurs

      Le recours aux mesures de contrôle est strictement encadré par des références légales (Charte des droits et libertés, Code civil, Loi sur l'instruction publique) et doit respecter les principes de dignité, d'intégrité et de sécurité de l'élève.

      Principes fondamentaux de l'intervention :

      Dernier recours : Utilisé uniquement lorsque les interventions préventives et les mesures alternatives ont échoué.

      Danger imminent : La menace doit être caractérisée par sa prévisibilité, son immédiateté et la gravité de ses conséquences.

      Contrainte minimale : La mesure doit être la moins restrictive possible et durer le moins longtemps possible (cesser dès que le danger est écarté).

      Respect et dignité : L'intervention doit être empreinte de bienveillance et de chaleur humaine, sous une surveillance constante.

      Suivi obligatoire : Chaque application doit faire l'objet d'un suivi postsituationnel pour évaluer l'efficacité et réguler les futures interventions.

      2. Définitions des mesures de contrôle

      Le cadre distingue plusieurs types d'interventions pour assurer une compréhension commune au sein du réseau scolaire.

      | Type de mesure | Description | Exemples | | --- | --- | --- | | Contention physique | Utilisation de la force humaine pour immobiliser ou diriger un élève contre son gré. | Tenir le bras d'un élève qui résiste ou le maintenir s'il frappe. | | Contention mécanique | Emploi d'un équipement ou de matériel pour limiter le mouvement. | Mitaines de sécurité, vestes de retenue dans le transport scolaire. | | Retrait de matériel | Confiscation d'un appareil palliant normalement un handicap. | Retirer les freins d'un fauteuil roulant ou confisquer une marchette. | | Isolement | Confinement de l'élève dans un lieu d'où il ne peut sortir librement. | Tenir la poignée d'une porte fermée ou bloquer physiquement l'accès. |

      Note : L'administration de substances chimiques à des fins de contrôle nécessite une prescription médicale et n'est pas traitée dans ce document.

      3. Cadre opérationnel : Intervention planifiée vs non planifiée

      Le cadre distingue deux contextes d'application, impactant directement les responsabilités professionnelles.

      | Caractéristique | Intervention Non Planifiée | Intervention Planifiée | | --- | --- | --- | | Contexte | Comportement inhabituel et imprévisible. | Comportement connu et susceptible de se répéter. | | Outil de gestion | Protocole-école (universel). | Protocole-élève (personnalisé, lié au Plan d'intervention). | | Décision (Contention) | Activité non réservée (urgence). | Activité réservée aux professionnels habilités. | | Décision (Isolement) | Activité non réservée. | Activité non réservée (mais encadrée). | | Application | Activité non réservée. | Activité non réservée. |

      4. La démarche d'intervention en cinq étapes

      Pour assurer la sécurité et le respect des droits, une structure systématique est proposée :

      1. Élaboration du protocole : Mise en place préventive de balises (comité-école pour le protocole-école ; équipe-école et parents pour le protocole-élève).

      2. Application des interventions préventives et alternatives : Utilisation de stratégies éducatives pour éviter la crise (diversion, sécurisation de l'environnement).

      3. Évaluation du danger : Analyse rigoureuse de la situation selon les critères de prévisibilité, d'immédiateté et de gravité.

      4. Application de la mesure de contrôle : Mise en œuvre selon les balises du protocole et les recommandations professionnelles.

      5. Modalités postsituationnelles : Retour sur l'événement, établissement des faits, soutien aux témoins (élèves et adultes) et révision du protocole.

      5. Prévention et climat scolaire

      La prévention est la "première voie d'action". Le document souligne l'importance du Système de soutien à paliers multiples (SSPM) :

      Palier 1 (Universel) : Soutien proactif pour tous les élèves (climat sain, règles claires, relations positives).

      Palier 2 (Ciblé) : Soutien supplémentaire pour les élèves à risque (autorégulation, habiletés sociales).

      Palier 3 (Intensif) : Interventions individualisées pour les difficultés graves ou persistantes.

      Le modèle "3 x 3" du CSSMB est cité en exemple, croisant l'intensité de l'intervention avec les sphères individuelle, scolaire et familiale.

      6. Rôles et responsabilités clés

      Le succès de ce cadre repose sur une responsabilité partagée :

      Direction d'établissement : Coordonne l'élaboration des protocoles, assure la formation du personnel et veille au bien-être physique et psychologique de tous.

      Personnel professionnel habilité (Ergothérapeutes, infirmiers, médecins, physiothérapeutes, psychoéducateurs, psychologues) : Réalise l'évaluation clinique, décide de la mesure en contexte planifié et émet des recommandations.

      Intervenants scolaires : Collaborent à l'analyse des comportements, appliquent les mesures en suivant les protocoles et informent la direction.

      Parents et élèves : Doivent être impliqués activement dans l'élaboration du protocole-élève. Un consentement libre et éclairé est requis pour toute mesure planifiée.

      Citations et informations critiques

      « Une mesure de contrôle [...] est une intervention de dernier recours qui devrait être réalisée exclusivement en situation d’urgence, c’est-à-dire lorsque la sécurité du personnel ou des élèves est menacée. » — Bernard Drainville, Ministre de l'Éducation

      « L’utilisation d’une mesure de contrôle n’est pas préconisée en milieu scolaire. [...] Elle ne doit jamais être employée comme mesure éducative ou punitive ou encore pour faciliter la surveillance de l’élève. » — Source Contextuelle, Section 1.1

      « Le recours aux mesures de contrôle est susceptible d’entraîner des blessures physiques et psychologiques qui peuvent avoir des implications à long terme. » — Source Contextuelle, Section 1

    1. des espaces

      Utiliser plutôt des _ (ou à la limite, des tirets -)

      Les points marchent, mais sont à éviter, car on les utilise beaucoup en CSS, et ça peut créer des confusions.

      #ancre <br>

      #Exemple_oiseau_migrateur

    2. <h1>

      Il y a 6 niveaux de titre. Header 1 jusqu'au 6. H1 est écrit en gros, H6 est écrit dans la taille la plus petite des titres.

      Attention: une seule balise H1 par page. C'est le titre affiché de la page. Par contre, pour les autre headers, on peut en mettre autant qu'on veut.

      Les headers ne sont pas seulement important pour afficher une hiérarchie de titres sur la page. Mais Google et les moteurs de recherche les lisent, pour savoir de quoi parle la page. Ils accordent plus d'importance aux titres qu'aux non-titres pour évaluer le contenu. Si on met plusieurs balises H1 (il ne devrait y en avoir qu'une par page, Google enlève du "poids" (de l'importance) à un des titres, donc ce n'est pas souhaitable our des raison d'optimisation du code dans les moteurs de recherche (permettre à notre site d'être facilement trouvés par les internet dans Google).

      On ne doit pas non plus tricher en mettant des titres, et un contenu qui n'a rien à voir, ou trop de titres par rapport au contenu, Google le détecte.

    3. index.html

      Il est très important de l'appeler exactement de ce nom. Et bien vérifier que le fichier n'a pas été enregistré en index.html.txt par exemple, car ça ne marchera pas!

      En minuscule "index" et pas "Index", car ce serait 2 fichiers différents.

      Si on nomme la page accueil.html ou ma-page.html, le site ne s'ouvrira pas automatiquement !

      Quand un serveur web (ou un navigateur) entre dans un dossier, il cherche par défaut un fichier nommé index. C'est l'automatisme qui permet d'arriver sur www.monsite.com sans avoir à taper www.monsite.com/accueil.html.

      Squelette, structure minimale de ce fichier:

      Un fichier index.html doit toujours contenir ces éléments dans cet ordre pour être considéré comme "propre" :

      <html> <head> <meta charset="UTF-8"> <title> Ma page... </title> </head> <body>

      Bienvenue sur mon site

      </body> </html>

      Noter que le titre se trouve pas dans le corps (body) mais dans l'en-tête (head), c'est le titre qui est destiné à Google, mais le titre qui s'affiche sur la page est dans la balise H1

      Le fichier index.html doit tojours être à la racine du dossier dans lequel se trouve le site web, sinon le serveur ne le trouvera pas tout seul.

    4. Si les accents s'affichent mal par la suite, c'est qu'il y a un problème avec l'encodage. Vérifiez que la balise meta indique bien UTF-8, et que votre fichier est enregistré en UTF-8.

      Si on mets une étiquette "Fr" mais que c'est écrit avec un alphabet bizarre que personne ne peut lire, l'étiquette ne sert à rien. Enregistrer en UTF-8, c'est choisir le bon alphabet au moment de cliquer sur "Enregistrer".

      L'encodage est un réglage de l'éditeur de texte (VS Code, Sublime Text, Notepad++, etc.).

      Dans VS Code : Tout en bas à droite de la fenêtre, dans la barre bleue. C'est écrit "UTF-8". Si c'est écrit"Windows-1252" ou "Western", c'est qu'il y a un blème !

      Par Les "é" deviennent des é

      Si les accents bugguent malgré la balise meta :

      Ouvrir le fichier dans l'éditeur.

      Cliquer sur l'encodage en bas à droite (souvent "UTF-8" ou "Win1252"). "Save with Encoding" (Enregistrer avec l'encodage).

      Sélectionner UTF-8.
      
    5. faire un clic droit sur n'importe quelle page, et de sélectionner l'inspecteur.

      Essayez maintenant! Moi j'ai "voir le code source de la page" dans le menu contextuel (qui apparaît quand on a fait clic droit) Et voilà !

    6. le navigateur web qui fait le reste du travail : lire le code HTML et CSS

      CSS et HTML sont des langages (pas de programmation mais de mise en page) HTML: langage de balisage, et CSS langage de marquage (mark up language).

      Ils fonctionnent de façon complémentaire pour créer des pages web, d'un côté le fond (HTML), de l'autre la forme (CSS), et quand on parle de langage, il faut quelque chose pour le traduire en rendu, ici c'est le navigateur.

      Le navigateur est l'interpréteur de ces langages, c'est lui qui les lit, les décode pour les transformer en page qu'on voit.

      Dans la famille des langages informatiques, on a les langages interprétés et compilés, ceux-ci sont interprétés, par le navigateur.

    7. GitHub est la plateforme la plus utilisée pour héberger du code et collaborer dessus.

      C'est incroyable comme les gens ne savent pas apprendre, ils surlignent tout, sauf ce qu'il est important de retenir. Ici, une info importante.

    8. isual Studio Code

      Vous n'êtes pas obligés d'installer Visual Studio Code. Il y a beaucoup d'autres logiciels. VSCodium est mieux, car sans les mouchards Microsoft; Voici une version sans installation (portable, à juste mettre sur une clef USB): https://portapps.io/app/vscodium-portable/, il y a aussi Notepad++; qu'on peut aussi mettre sur une clef USB,; etc...

    1. Build info

      I think the build info should include more information regarding what is already prepared to be "updated" such as analyses parameters. And, open science aspects such as open data, open code, etc; link to github, osf account etc.

    1. risk factorsfor PDA, such as chronic pancreatitis or genetic variants ofacinar-specific genes (Li et al., 2012; Lowenfels et al., 1993,1997), may increase PDA risk by rendering acinar cells moreplastic and reducing the threshold for ADM.

      Cell plasticity refers to the ability of cells to change their identity, phenotype, or function in response to environmental cues, stress, or developmental signals, without altering their underlying genetic code (genotype).

      These damage and genetic changes causes the cells to be able pliable and change its phenotype in response to its environment without changing its DNA.

    Annotators

    1. La Protection de l’Enfance en France : Analyse de la Crise et Préconisations du CESE

      Synthèse (Executive Summary)

      Le système de protection de l’enfance en France traverse une crise profonde et structurelle qui menace ses missions fondamentales.

      Bien que le cadre législatif (lois de 2007, 2016 et 2022) soit considéré comme l'un des plus aboutis, plaçant l'intérêt supérieur et les besoins fondamentaux de l'enfant au cœur des dispositifs, un décalage alarmant persiste entre l'ambition légale et la réalité du terrain.

      Les points critiques identifiés incluent une augmentation constante des besoins (+49 % de mineurs accueillis en 20 ans), une pénurie sévère de professionnels qualifiés, et une hétérogénéité territoriale préoccupante.

      L'un des constats les plus graves est l'inexécution d'une part significative des décisions de justice destinées à protéger les enfants en danger.

      Le Conseil économique, social et environnemental (CESE) appelle à une remobilisation nationale, une gouvernance interministérielle renforcée sous l'égide du Premier ministre, et une garantie d'égalité de traitement pour tous les mineurs, incluant les mineurs non accompagnés (MNA) et les enfants en situation de handicap.

      --------------------------------------------------------------------------------

      I. Un État de Crise Structurelle et Statistique

      A. Une hausse préoccupante de la demande de protection

      Les données de l'Observatoire national de la protection de l'enfance (ONPE) et de la DREES révèlent une pression sans précédent sur les services de l'Aide Sociale à l'Enfance (ASE) :

      Chiffres clés : Au 31 décembre 2022, 344 682 mineurs et jeunes majeurs sont pris en charge.

      Évolution : Le nombre de jeunes accueillis en établissement a augmenté de plus de 50 % entre 2011 et 2022.

      Déjudiciarisation en échec : Malgré la volonté de privilégier l'administratif, 82 % des prises en charge de mineurs résultent d'une décision judiciaire.

      B. Le lien entre pauvreté et protection de l'enfance

      Il existe une corrélation forte entre la précarité économique et l'intervention de la protection de l'enfance. La France affiche un taux de pauvreté infantile de 20 % (33ème position sur 39 pays de l'UE/OCDE).

      Conséquences : 2,9 millions d'enfants vivent sous le seuil de pauvreté ; 42 000 sont sans domicile fixe.

      Coût social : Les événements traumatisants subis pendant l'enfance coûtent environ 34,5 milliards d'euros par an à la France en frais de santé et entraînent une perte d'espérance de vie de 20 ans pour les victimes.

      --------------------------------------------------------------------------------

      II. Défaillances de Gouvernance et de Financement

      A. Pilotage national et territorial

      La gouvernance actuelle souffre d'un manque de lisibilité interministérielle et de disparités territoriales majeures.

      Inégalités territoriales : Le taux de prise en charge varie de 10 pour 1000 en Guyane à 49 pour 1000 dans la Nièvre.

      Financement : Les dépenses des départements pour l'ASE ont atteint 9,7 milliards d'euros en 2023. Les ressources (principalement les DMTO) sont volatiles et déconnectées de la dynamique des besoins.

      Contractualisation : Le levier financier de l'État reste marginal (environ 140 M€ via le programme 304) par rapport aux budgets départementaux.

      B. L'inexécution des décisions de justice

      Le système repose sur des juges en sous-effectif (un juge suit 450 à 500 enfants contre un idéal de 325). En raison du manque de places en structure, des décisions de placement ne sont pas exécutées, laissant des enfants en danger dans leur milieu familial, ou "mal exécutées" dans des structures inadaptées.

      --------------------------------------------------------------------------------

      III. Garantir les Droits et les Besoins de l'Enfant

      A. Le Projet pour l'Enfant (PPE) : Une obligation non respectée

      Instauré en 2007, le PPE doit être la "boussole" du parcours de l'enfant pour garantir sa stabilité et son développement. Cependant, il n'est toujours pas effectif dans de nombreux départements.

      Préconisation : Faire du PPE une condition préalable à l'attribution des financements de l'État.

      B. La prise en charge de la santé et du handicap

      Les enfants de l'ASE présentent des pathologies psychiques et somatiques plus fréquentes.

      Urgence psychologique : Le CESE demande que tout enfant protégé soit présumé en situation d'urgence psychologique pour faciliter l'accès immédiat aux soins (CMPP).

      Handicap : Environ 25 % des enfants accueillis sont en situation de handicap, mais seul un tiers bénéficie d'un accompagnement médico-social adapté.

      --------------------------------------------------------------------------------

      IV. Groupes Particulièrement Vulnérables

      A. Les Mineurs Non Accompagnés (MNA) : Une protection "au rabais"

      Le CESE dénonce une approche de plus en plus centrée sur les politiques migratoires plutôt que sur la protection de l'enfance.

      Discrimination financière : Le prix de journée pour un MNA est souvent de 50-60 € contre 170 € pour les autres mineurs.

      Évaluation de la minorité : Les procédures sont jugées lapidaires et s'appuient trop souvent sur des tests osseux au manque de fiabilité scientifique avéré.

      B. Les jeunes majeurs

      La sortie du dispositif à 18 ou 21 ans reste une rupture brutale. Une étude de l'Insee indique qu'un quart des sans-abri sont d'anciens enfants placés.

      --------------------------------------------------------------------------------

      V. Les Professionnels : Une Crise d'Attractivité Majeure

      Le secteur souffre d'une pénurie de personnel dans toutes les catégories (éducateurs, assistants familiaux, médecins scolaires).

      Assistants familiaux : Leurs effectifs ont baissé de 9 % en 6 ans.

      Médecine scolaire : Moins de 800 médecins pour 12 millions d'élèves, ce qui entrave le repérage précoce.

      Conditions de travail : Les horaires atypiques, les faibles rémunérations et le sentiment de "travail en miettes" découragent les vocations.

      --------------------------------------------------------------------------------

      VI. Tableau Synthétique des Préconisations Clés du CESE

      | N° | Thématique | Mesure Principale | | --- | --- | --- | | 1 | Statistique | Missionner le GIP France Enfance Protégée pour un état des lieux annuel exhaustif des besoins et des mesures non exécutées. | | 2 & 3 | État | Créer une stratégie interministérielle bisannuelle avec péréquation financière et incitations pour les départements. | | 4 | Coordination | Généraliser les Comités Départementaux pour la Protection de l'Enfance (CDPE) pour décloisonner les acteurs. | | 6 | MNA | Interdire toute distinction de traitement entre MNA et autres mineurs (santé, éducation). | | 8 | Formation | Définir un plan de formation commun à tous les professionnels "sentinelles" (Éducation nationale, police, santé). | | 9 | Accueil | Diversifier les modes de prise en charge en multipliant les petites unités de vie (moins de 7 enfants). | | 10 | PPE | Rendre le "Projet pour l'Enfant" effectif et obligatoire pour tout financement. | | 11 | Santé | Systématiser l'accueil rapide en pédopsychiatrie (présomption d'urgence psychologique). | | 13 | Justice | Assistance systématique d'un avocat spécialisé pour l'enfant protégé. | | 15 | Contrôle | Créer une autorité nationale indépendante pour le contrôle des structures d'accueil. | | 17 | Droit | Créer un Code de l'Enfance regroupant l'ensemble des droits, libertés et devoirs des enfants. | | 18 | Encadrement | Publier les décrets sur le socle minimal d'encadrement et instaurer un nombre maximal de mesures par travailleur social. |

      --------------------------------------------------------------------------------

      Conclusion

      La protection de l'enfance ne peut plus être la variable d'ajustement des dysfonctionnements institutionnels.

      Le CESE insiste sur le fait que l'enfant doit être le sujet et non l'objet de la protection.

      Sans un investissement massif dans les ressources humaines et une coordination réelle entre l'État et les départements, la promesse républicaine de protéger les plus vulnérables ne pourra être tenue.

    1. Peter Naur reminded us some decades ago that a program is more than its source code. Rather a program is a theory that lives in the minds of the developer(s) capturing what the program does, how developer intentions are implemented, and how the program can be changed over time. Usually this theory is not just in the minds of one developer but fragments of this theory are distributed across the minds of many, if not thousands, of other developers.

      Peter Naur, Programming as Theory Building 1985 https://doi.org/10.1016/0165-6074(85)90032-8

      Programming as theory building 1985 in Zotero

    2. Technical debt lives in the code; cognitive debt lives in developers' minds

      Image putting techdebt and cognitive debt next to each other. Looks very generated btw. Techdebt described as legacy code, quick fixes, buggy logic: messy code & complexity. Cognitive debt as: lost understanding, knowledge gaps, team confusion: overhelmed developers.

    3. the humans involved may have simply lost the plot and may not understand what the program is supposed to do, how their intentions were implemented, or how to possibly change it.

      key imo. generating code / material, can quickly mean loss of overview (I see how that happens in my use of #algogens if I don't explicitly counteract it), uncertainty about how demands were implemented, and thus what entry points for change there are.

    4. Technical debt nicely captures that “human understanding” also matters, but the words “technical debt” conjure up the notion that the accrued debt is a property of the code and effort needs to be spent on removing that debt from code.

      While techdebt is about the accumulation of human decisions, and the resulting erosion of human understanding, the term itself suggests it is a property of the code itself, and that one could remove it from code.

    1. I've experienced this myself on some of my more ambitious vibe-code-adjacent projects. I've been experimenting with prompting entire new features into existence without reviewing their implementations and, while it works surprisingly well, I've found myself getting lost in my own projects. I no longer have a firm mental model of what they can do and how they work, which means each additional feature becomes harder to reason about, eventually leading me to lose the ability to make confident decisions about where to go next.

      Vibecoding and adjacent projects lead to loosing overview of your own work, no mental model of what you made as you would have otherwise. Extending something becomes harder over time, bc you don't know what you're actually extending from. This is a counter force (not counter argument I think) to the notion of genAI having deterministic automation as endpoint.

    1. Analyse de l’Expérience Émotionnelle en Milieu Scolaire : Le Dispositif des « Moments Spéciaux »

      Synthèse

      Ce document de synthèse détaille les recherches menées par Sophie Necker et ses collègues sur la saisie des états émotionnels au sein de la classe.

      S’appuyant sur une étude menée en 2021 dans deux classes de CM2, le projet repose sur le dispositif de la « boîte à moments spéciaux ».

      Cette méthode permet d’accéder à la subjectivité des élèves et des enseignants à travers l'écriture quotidienne et volontaire de billets anonymes.

      Les conclusions mettent en lumière la dimension systémique des émotions, où les vécus individuels s'entremêlent pour former un paysage émotionnel collectif.

      L’innovation majeure de cette recherche réside dans la création de « l’Émoscope », une cartographie graphique permettant de visualiser la complexité des interactions entre déclencheurs, évaluations subjectives et expressions émotionnelles à l’échelle d’une journée de classe.

      --------------------------------------------------------------------------------

      1. Le Dispositif de Recherche : La Boîte à Moments Spéciaux

      La recherche vise à accéder aux traces des émotions et à la subjectivité des acteurs en milieu scolaire.

      Méthodologie et Protocole de Recueil

      Contexte : Étude réalisée en mai 2021 dans deux classes de CM2 à Lille (51 élèves et 2 enseignantes).

      Le Support : Des bandelettes de papier (environ 10 cm de haut) intitulées « billet moment spécial ».

      La Consigne : « Tu as vécu un moment spécial dans la classe aujourd'hui. Peux-tu l'écrire et le mettre dans la boîte s'il te plaît ? ».

      Caractéristiques du recueil :

      ◦ Écriture volontaire et quotidienne en fin de journée.    ◦ Anonymat préservé pour favoriser la liberté d’expression.  

      ◦ Durée d’un mois, totalisant 764 billets recueillis.

      Le « Moment Spécial » : Défini par sa singularité et sa significativité pour l’individu, sans injonction de valence positive ou négative.

      Il s'inspire des concepts de « moments optimaux » ou de « flow », mais élargi à toute intensité émotionnelle.

      --------------------------------------------------------------------------------

      2. Fondements Théoriques : Une Approche Systémique

      La recherche considère l’expérience vécue comme un objet scientifique à part entière.

      L'Interdépendance Émotionnelle

      La classe est envisagée comme un système d’interactions réciproques et complexes :

      Influence mutuelle : Les états émotionnels de l'enseignant impactent ceux des élèves et réciproquement.

      Attention conjointe : La perception de la situation est déterminée par le partage de l'attention entre les acteurs.

      Relation élève-enseignant : Cette relation influence la qualité de vie scolaire, les comportements et le regard porté sur les apprentissages.

      Définition de l'Émotion

      L’émotion est comprise comme un processus évaluatif dynamique :

      • Elle permet à l’individu de spécifier la signification d’une situation à ses yeux.

      • Une même situation peut donner lieu à des évaluations différentes selon les individus ou les contextes.

      Les composantes de l'évaluation (selon Audrin) :

      1. Physiologique : Réactions corporelles (ex. frissons).  

      2. Expression motrice : Expressions faciales, voix, posture.   

      3. Motivationnelle : Tendance à l'action (approche ou fuite).  

      4. Sentiment subjectif : Synthèse des différentes dimensions.

      --------------------------------------------------------------------------------

      3. Analyse des Résultats : Typologie des Expériences

      L'analyse des billets révèle plusieurs dimensions du rapport au monde scolaire.

      Rapport à Soi et à Autrui

      Connaissance de soi : Les billets expriment des attirances ou des antipathies (« Je déteste la danse »).

      Sentiment de compétence : La réussite ou la difficulté face à une tâche génère des émotions saillantes (fierté, stress de l'évaluation).

      Présence d'autrui : L'autre peut être déclencheur (exposé d'un camarade), partenaire d'émotion ou destinataire d'une action.

      L'enseignant est souvent évoqué indirectement à travers ses choix pédagogiques et didactiques.

      Continuité et Rupture

      Zone de confort et continuité : Moments venant renforcer l'identité de l'élève ou s'inscrivant dans une unité sociale et temporelle réconfortante.

      Rupture et irruption : Émotions liées à la nouveauté, à la découverte de connaissances, à des activités inhabituelles ou à des irruptions spatiales (intervenant extérieur, sortie).

      Littératie Émotionnelle et Verbalisation

      L'étude observe une gradation dans la capacité des élèves à verbaliser l'émotion :

      Niveau 1 : Nommer uniquement le déclencheur (ex: « L'histoire »).

      Niveau 2 : Décrire les faits ou les actions.

      Niveau 3 : Transcrire le ressenti ou attribuer une valeur (ex: « J'ai aimé »).

      Niveau 4 : Argumenter l'évaluation (ex: « C'est passionnant car... »).

      --------------------------------------------------------------------------------

      4. L’Émoscope : Cartographier le Paysage Émotionnel

      L'innovation majeure de la recherche est la création de l'Émoscope, un outil de représentation graphique.

      | Caractéristique de l'Émoscope | Fonctionnalité | | --- | --- | | Structure | Une roue où chaque portion représente un billet individuel. | | Code Couleur | Identifie l'événement déclencheur (ex: sport, conseil de classe, exposé). | | Pictogrammes | Indiquent la nature du rapport (soi, autrui, rupture, continuité). | | Bulles de Verbatim | Reprennent les mots exacts utilisés pour décrire l'émotion. | | Flèches | Symbolisent le processus évaluatif et les composantes identifiées. |

      Cet outil permet de passer de l’analyse d’un billet individuel à une vision globale du climat de la classe sur une unité de temps donnée (la journée).

      --------------------------------------------------------------------------------

      5. Perspectives et Implications Pédagogiques

      La recherche ouvre des pistes pour la formation et la pratique enseignante.

      Pour les Praticiens et Chercheurs

      Analyse de pratiques : Utiliser l'Émoscope pour comparer les vécus selon les enseignants ou les dispositifs pédagogiques.

      Évolution méthodologique : Envisager des formats numériques (audio, vidéo) pour lever les freins liés aux compétences rédactionnelles.

      Suivi longitudinal : Utiliser des carnets de billets pour suivre l'évolution émotionnelle d'un élève sur le long terme.

      Pour la Formation

      Conscientisation : Aider les futurs enseignants à comprendre la systémie émotionnelle de la classe.

      Indicateur d'apprentissage : Explorer les émotions des élèves comme des marqueurs de progression et de sécurité affective.

      Conclusion de l'Étude

      Le dispositif de la boîte à moments spéciaux démontre que les émotions, bien que subjectives, peuvent être saisies et cartographiées.

      Elles constituent une porte d'entrée essentielle pour comprendre les dynamiques d'apprentissage et le bien-être au sein de la communauté éducative.

    1. code-switching is a phenomenon that increasing numbers of people are likely to experience. Claire Kramsch describes this phenomenon as ‘language crossings’ (1998). She provides examples which highlight complex manifestations of identity enactment;

      Code switching happens a lot in English in the US depending on the social group you are with

    1. ront-load your learning; learn passively, not actively.Kids learn from seeing things modeled. Learning by osmosis is what some people call it and honestly it's what I do for a lot of things; I surrounded myself with smart nice people and paid attention to stuff they talked about and eventually I learned to code with no classes and frankly very little active research.
    1. État des Lieux Scientifique des Thérapies Manuelles : Entre Mythes et Réalités

      Résumé Exécutif

      Ce document de synthèse analyse l'état actuel des connaissances scientifiques concernant les thérapies manuelles (kinésithérapie, ostéopathie, chiropraxie, étiopathie), avec un accent particulier sur le mal de dos, principal motif de consultation.

      Les points saillants sont les suivants :

      Le primat du mouvement : La science moderne démontre que le traitement le plus efficace contre la lombalgie est le mouvement actif.

      Les thérapies passives ne doivent pas être utilisées de manière isolée.

      Obligations légales et déontologiques : Contrairement aux pseudomédecines, la kinésithérapie est encadrée par l'obligation d'utiliser des moyens conformes aux « données acquises de la science », un principe juridique ancré depuis l'arrêt Mercier de 1936.

      Déconstruction des mythes : Les concepts de « vertèbre déplacée » ou de « bassin décalé » sont des vues de l'esprit sans réalité anatomique.

      La palpation manuelle, bien que rassurante, manque de fiabilité scientifique pour établir un diagnostic de texture ou de blocage.

      Risques et conséquences sociales : Au-delà de l'effet placebo ou contextuel, certaines manipulations (notamment cervicales) présentent des risques graves comme l'accident vasculaire cérébral (AVC).

      De plus, ces pratiques peuvent parasiter les messages de santé publique et altérer la littératie en santé des patients.

      --------------------------------------------------------------------------------

      1. L'Évolution de la Science face au Mal de Dos

      L'approche médicale de la lombalgie a radicalement changé au cours des trente dernières années, passant d'une logique de repos à une logique d'action.

      Chronologie des changements de paradigme

      1986 : Une étude du New England Journal of Medicine suggère que deux jours de repos au lit sont plus bénéfiques que sept jours.

      1995 : Une étude pivot démontre que le groupe "témoin" (continuant à vivre normalement) récupère mieux que les groupes soumis à un repos strict ou à des exercices trop prudents.

      2019 : La Haute Autorité de Santé (HAS) et l'Assurance Maladie lancent des recommandations officielles : « Le bon traitement, c'est le mouvement ».

      Les thérapies passives isolées sont déclarées inefficaces sur l'évolution de la lombalgie.

      Le bénéfice physiologique du mouvement

      Contrairement aux idées reçues, des activités comme la course à pied améliorent la physiologie discale.

      L'alternance de pressions et dépressions (environ 1 Hz) lors de la course permet d'hydrater les disques intervertébraux. Statistiquement, les coureurs de fond souffrent moins du dos que les autres sportifs.

      --------------------------------------------------------------------------------

      2. Cadre Juridique et Déontologique : La Science comme Obligation

      La distinction entre kinésithérapie et thérapies alternatives repose sur un fondement juridique historique.

      L'Arrêt Mercier (1936)

      Ce tournant de la Cour de cassation a établi trois principes majeurs :

      1. Le contrat de soins : Il existe un lien contractuel entre le soignant et le patient.

      2. L'obligation de moyens : Le soignant n'a pas d'obligation de résultat (guérison), mais doit mettre en œuvre tous les moyens nécessaires.

      3. Les données acquises de la science : Les moyens choisis doivent être conformes aux connaissances scientifiques actuelles.

      Évolution des pratiques en kinésithérapie

      Le code de déontologie impose aux kinésithérapeutes d'abandonner les pratiques invalidées. Par exemple :

      Bronchiolite : La kinésithérapie respiratoire pédiatrique n'est plus recommandée depuis 2019 pour les nourrissons sains, car le bénéfice est jugé insuffisant par rapport au caractère traumatisant du soin.

      Massage : Son usage est désormais limité (cicatrices, œdèmes) et n'est plus recommandé comme traitement de première intention pour le mal de dos.

      --------------------------------------------------------------------------------

      3. Analyse Critique des Thérapies Manuelles

      Les limites de la palpation et du diagnostic manuel

      La science démontre que le sens tactile des praticiens est sujet à l'illusion.

      Manque de fiabilité : Deux évaluateurs sont rarement d'accord sur la texture (dur/mou) ou le caractère « bloqué » d'un tissu.

      Précision anatomique : En palpant une structure évidente sous la peau, l'erreur moyenne est de 5 cm.

      Impossibilité mécanique : Il est impossible de mobiliser une seule vertèbre de façon isolée ; une manipulation en impacte au minimum trois.

      Effet "Gate Control" et placebo

      Les thérapies manuelles produisent un effet antalgique réel mais transitoire :

      Distraction sensorielle : Le système nerveux privilégie les sensations tactiles, de chaud ou de froid sur la douleur. C'est un effet à court terme (quelques minutes à quelques heures).

      Effet contextuel : Le rituel de la consultation, l'attention portée par le praticien et la régression naturelle vers la moyenne (la douleur diminue souvent d'elle-même au moment où l'on consulte) renforcent l'illusion d'efficacité.

      --------------------------------------------------------------------------------

      4. Histoire et Fondements des Pseudomédecines Manuelles

      Les thérapies comme l'ostéopathie ou la chiropraxie reposent sur le vitalisme, une philosophie du XIXe siècle postulant l'existence d'une « force vitale » non physique.

      | Discipline | Origine | Fondements Idéologiques | État actuel en Europe | | --- | --- | --- | --- | | Ostéopathie | A.T. Still (1874) | "Le corps est la pharmacie de Dieu". Flux sanguin synonyme de santé. | Branche "puriste" (Littlejohn) très présente, axée sur le crânio-sacré et le fluidique. | | Chiropraxie | D.D. Palmer (1895) | Système nerveux central comme maître du corps. Recours aux manipulations à haute vélocité (faire craquer). | Pratique restée proche des concepts originels, avec une forte présence sur les réseaux sociaux. | | Étiopathie | C. Trédaniel (Fr) | Recherche de l'origine de la pathologie dans l'ajustement articulaire. | Très similaire à l'ostéopathie, sans distinction scientifique réelle. |

      Note sur l'exception américaine : Aux États-Unis, l'ostéopathie s'est médicalisée suite au rapport Flexner (1910). Les "DO" y sont des médecins généralistes qui ne pratiquent quasiment plus de thérapie manuelle, contrairement à la branche européenne restée mystique.

      --------------------------------------------------------------------------------

      5. Risques et Impacts Sociétaux

      Sécurité et perte de chance

      Risques graves : Les manipulations cervicales peuvent provoquer des dissections de l'artère vertébrale, entraînant des AVC ou le syndrome de "Locked-in" (paralysie totale avec conscience préservée).

      Erreurs de diagnostic : Le recours direct à ces thérapies sans avis médical peut retarder la prise en charge de pathologies graves (ex: fractures non détectées).

      Parasitage du message médical

      Le "vernis médical" utilisé par ces disciplines (mots tels que « diagnostic », « anamnèse », « consultation ») crée une confusion chez les patients :

      Atteinte à la littératie en santé : En ancrant des concepts erronés (vertèbre déplacée, jambe plus courte), les praticiens créent une dépendance et une peur de bouger (kinésiophobie).

      Facteurs sociaux : Le principal facteur de persistance d'une lombalgie n'est pas mécanique, mais lié à l'insatisfaction au travail ou à des problèmes sociétaux. Les thérapies manuelles, en se focalisant sur le "crack and go", ignorent cette complexité.

      Conclusion

      Si les thérapies manuelles offrent un soulagement temporaire et un confort relationnel, elles ne constituent pas une solution de fond au mal de dos.

      La science préconise une approche centrée sur l'éducation thérapeutique, la gestion de la motivation et, impérativement, le mouvement actif du patient.

    1. L'Esprit Critique au Cœur de l'Enquête Privée Spécialisée : Analyse des Pratiques de Benoît Judde

      Ce document de synthèse analyse les interventions de Benoît Judde, détective privé spécialisé, concernant l'évolution de la profession de détective en France, le cadre juridique des dérives sectaires et l'utilisation de l'esprit critique comme outil méthodologique fondamental pour l'administration de la preuve.

      Synthèse

      La profession de détective privé en France, désormais strictement réglementée et contrôlée par le ministère de l'Intérieur (CNAPS), s'est transformée en un auxiliaire de fait pour la défense des intérêts privés et le système judiciaire.

      Benoît Judde, spécialisé dans les faits de manipulation et les dérives sectaires, démontre que l'efficacité de l'enquêteur repose sur une maîtrise rigoureuse du cadre juridique et sur l'application de l'esprit critique.

      Cette approche, adossée aux psychologies cognitive et sociale expérimentales, permet de transformer des phénomènes subjectifs comme la « sujétion psychologique » en éléments de preuve objectifs, circonstanciés et recevables en justice.

      Le passage récent (2024) de la sujétion psychologique au statut d'infraction autonome renforce la nécessité d'une expertise technique capable de caractériser les manœuvres de manipulation sans tomber dans le biais de confirmation.

      --------------------------------------------------------------------------------

      1. Le Cadre Légal et Déontologique de la Profession

      La profession de détective privé, officiellement dénommée « agent de recherche privée », est définie par le Code de la sécurité intérieure (CSI).

      Définition et Prérogatives

      Selon l'article L621-1 du CSI, le détective est un professionnel libéral dont la mission consiste à recueillir des informations ou des renseignements destinés à des tiers, en vue de la défense de leurs intérêts.

      Anonymat d'enquête : C’est la seule profession parajuridique autorisée à enquêter sans révéler sa qualité, son identité réelle ou l’objet de sa mission. Contrairement aux commissaires de justice (huissiers), le détective peut agir sous une identité fictive.

      Recevabilité des preuves : Les rapports de détective doivent être « détaillés, circonstanciés et précis » (DCP) pour être recevables devant les tribunaux, selon une jurisprudence de la Cour de cassation datant de 1962.

      Régulation et Formation

      La profession est passée d'un état de « freestyle » à un encadrement strict :

      Contrôle du CNAPS : Le Conseil national des activités privées de sécurité (sous tutelle du ministère de l'Intérieur) délivre trois agréments distincts (personne physique, structure juridique, carte professionnelle), renouvelables tous les 5 ans après enquête de moralité approfondie.

      Formation obligatoire : Un niveau Bac+3 (licence professionnelle) est requis. Il n'existe que quatre écoles en France (deux universités et deux écoles privées), formant environ 120 nouveaux professionnels par an.

      Déontologie : Les détectives sont soumis au secret professionnel et à une obligation de conseil. Ils doivent notamment vérifier la légitimité de la demande pour éviter de servir des projets de vengeance ou des recherches malveillantes.

      --------------------------------------------------------------------------------

      2. L'Enquête Spécialisée dans les Dérives Sectaires

      Le champ d'action des détectives est vaste (recherche de personnes, contrefaçon, fraude à l'assurance), mais la spécialisation de Benoît Judde porte sur la manipulation mentale.

      Les Critères de la MIVILUDES

      Pour objectiver une dérive sectaire, l'enquêteur s'appuie sur le référentiel de la Mission interministérielle de vigilance et de lutte contre les dérives sectaires (MIVILUDES), qui identifie 10 critères principaux.

      | Catégorie d'atteinte | Exemples de sous-critères | | --- | --- | | Atteintes aux personnes | Rupture avec l'environnement d'origine, perte d'esprit critique, embrigadement des enfants, privation de sommeil ou de nourriture. | | Atteintes aux biens | Exigences financières disproportionnées, endettement, travail dissimulé (ex: détournement du concept de woofing). | | Vie sociale et démocratique | Discours antisocial, trouble à l'ordre public, détournement des circuits économiques. |

      Collaboration Interdisciplinaire

      L'enquêteur travaille en binôme avec un psychologue (spécialisé en psychologie scientifique, cognitive et sociale) pour valider la réalité de l'emprise.

      Cette collaboration permet d'apporter une « parole psychologique » crédible que le juriste ou le détective ne peut formuler seul, notamment pour qualifier le préjudice ou la sujétion devant un juge.

      --------------------------------------------------------------------------------

      3. Évolutions Législatives Récentes (Loi de 2024)

      Le cadre juridique français a récemment évolué pour faciliter la répression des dérives sectaires, rendant le rôle de la preuve plus complexe et crucial.

      Autonomie de la sujétion psychologique : Auparavant liée à l'abus de faiblesse (nécessitant de prouver un état de faiblesse préalable et un préjudice), la « mise en état de sujétion psychologique » est devenue une infraction autonome en 2024.

      Il suffit désormais de prouver l'utilisation de techniques de pression ou de manipulation altérant le jugement.

      Détournement de traitement médical : Une nouvelle infraction punit le fait de provoquer une personne à abandonner un traitement médical thérapeutique ou prophylactique (vaccination) au profit de pratiques pseudo-scientifiques.

      L'Escroquerie et la Cybermalveillance : Dans le domaine numérique, 95 % des arnaques reposent sur l'ingénierie sociale (manipulation humaine) plutôt que sur des failles purement techniques.

      --------------------------------------------------------------------------------

      4. L'Esprit Critique comme Méthodologie d'Enquête

      Pour Benoît Judde, l'esprit critique n'est pas une posture intellectuelle mais un outil de travail permettant d'éviter le biais de confirmation et d'assurer l'objectivité du rapport.

      Les Trois Piliers de la Manipulation

      L'enquêteur analyse les situations à travers trois mécanismes identifiés par la psychologie expérimentale :

      1. L'automanipulation : Utilisation des biais cognitifs naturels des individus.

      2. La soumission librement consentie : Techniques comme le « pied dans la porte » (obtenir un petit engagement pour en obtenir un plus grand) ou la « porte au nez » (demander l'excessif pour obtenir le raisonnable).

      3. La soumission à l'autorité : Référence à l'expérience de Milgram. La manipulation réussit si l'autorité est perçue comme légitime (ex: port d'une blouse, titre de « frère de Jésus », etc.).

      L'Objectivité de la Preuve

      Recours à la technologie : Utilisation de caméras cachées lors d'infiltrations pour fournir une preuve brute et incontestable, évitant ainsi la faillibilité de la mémoire humaine ou les accusations de partialité.

      Nécessité et proportionnalité : L'enquêteur doit justifier que l'atteinte à la vie privée (infiltration, surveillance) était strictement indispensable à la manifestation de la vérité et proportionnée à l'enjeu (droit à la preuve vs droit à la vie privée).

      --------------------------------------------------------------------------------

      5. Conclusion : Vers un Continuum de Sécurité

      Le document souligne que l'État ne peut assurer seul la surveillance de tous les risques, particulièrement dans les domaines complexes des dérives sectaires et thérapeutiques.

      Synergie Public-Privé : Le détective privé intervient là où la police ne peut plus agir (disparitions non inquiétantes, enquêtes pré-pénales pour consolider une plainte).

      Auxiliaire de Justice : En apportant des éléments basés sur un consensus scientifique (psychologie expérimentale), le détective aide le magistrat à fonder sa décision sur des faits plutôt que sur des témoignages contradictoires.

      Complémentarité : L'objectif n'est pas une « américanisation » du système, mais une validation réciproque où le secteur privé complète l'action régalienne en fournissant une expertise technique et de terrain spécifique.

    1. THE AMERICAN YAWP Menu Skip to content HomeAbout Barbara Jordan – On the Impeachment of Richard Nixon (1974) Brookes print Casta painting Contributors How the Other Half Lived: Photographs of Jacob Riis Introduction Note on Recommended Readings Press Sample Feedback (@AmericanYawp) Teaching Materials TEST: 11/18/2025 Updates Who Pays for This? 6. A New Nation “The Federal Pillars,” from The Massachusetts Centinel, August 2, 1789. Library of Congress. *The American Yawp is an evolving, collaborative text. Please click here to improve this chapter.* I. IntroductionII. Shays’s RebellionIII. The Constitutional ConventionIV. Ratifying the ConstitutionV. Rights and CompromisesVI. Hamilton’s Financial SystemVII. The Whiskey Rebellion and Jay’s TreatyVIII. The French Revolution and the Limits of LibertyIX. Religious FreedomX. The Election of 1800XI. ConclusionXII. Primary SourcesXIII. Reference Material I. Introduction On July 4, 1788, Philadelphians turned out for a “grand federal procession” in honor of the new national constitution. Workers in various trades and professions demonstrated. Blacksmiths carted around a working forge, on which they symbolically beat swords into farm tools. Potters proudly carried a sign paraphrasing from the Bible, “The potter hath power over his clay,” linking God’s power with an artisan’s work and a citizen’s control over the country. Christian clergymen meanwhile marched arm-in-arm with Jewish leaders. The grand procession represented what many Americans hoped the United States would become: a diverse but cohesive, prosperous nation.1 Over the next few years, Americans would celebrate more of these patriotic holidays. In April 1789, for example, thousands gathered in New York to see George Washington take the presidential oath of office. That November, Washington called his fellow citizens to celebrate with a day of thanksgiving, particularly for “the peaceable and rational manner” in which the government had been established.2 But the new nation was never as cohesive as its champions had hoped. Although the officials of the new federal government—and the people who supported it—placed great emphasis on unity and cooperation, the country was often anything but unified. The Constitution itself had been a controversial document adopted to strengthen the government so that it could withstand internal conflicts. Whatever the later celebrations, the new nation had looked to the future with uncertainty. Less than two years before the national celebrations of 1788 and 1789, the United States had faced the threat of collapse.   II. Shays’s Rebellion Daniel Shays became a divisive figure, to some a violent rebel seeking to upend the new American government, to others an upholder of the true revolutionary virtues Shays and others fought for. This contemporary depiction of Shays and his accomplice Job Shattuck portrays them in the latter light as rising “illustrious from the Jail.” Unidentified artist, Daniel Shays and Job Shattuck, 1787. Wikimedia. In 1786 and 1787, a few years after the Revolution ended, thousands of farmers in western Massachusetts were struggling under a heavy burden of debt. Their problems were made worse by weak local and national economies. Many political leaders saw both the debt and the struggling economy as a consequence of the Articles of Confederation, which provided the federal government with no way to raise revenue and did little to create a cohesive nation out of the various states. The farmers wanted the Massachusetts government to protect them from their creditors, but the state supported the lenders instead. As creditors threatened to foreclose on their property, many of these farmers, including Revolutionary War veterans, took up arms. Led by a fellow veteran named Daniel Shays, these armed men, the “Shaysites,” resorted to tactics like the patriots had used before the Revolution, forming blockades around courthouses to keep judges from issuing foreclosure orders. These protesters saw their cause and their methods as an extension of the “Spirit of 1776”; they were protecting their rights and demanding redress for the people’s grievances. Governor James Bowdoin, however, saw the Shaysites as rebels who wanted to rule the government through mob violence. He called up thousands of militiamen to disperse them. A former Revolutionary general, Benjamin Lincoln, led the state force, insisting that Massachusetts must prevent “a state of anarchy, confusion and slavery.”3 In January 1787, Lincoln’s militia arrested more than one thousand Shaysites and reopened the courts. Daniel Shays and other leaders were indicted for treason, and several were sentenced to death, but eventually Shays and most of his followers received pardons. Their protest, which became known as Shays’s Rebellion, generated intense national debate. While some Americans, like Thomas Jefferson, thought “a little rebellion now and then” helped keep the country free, others feared the nation was sliding toward anarchy and complained that the states could not maintain control. For nationalists like James Madison of Virginia, Shays’s Rebellion was a prime example of why the country needed a strong central government. “Liberty,” Madison warned, “may be endangered by the abuses of liberty as well as the abuses of power.”4   III. The Constitutional Convention The uprising in Massachusetts convinced leaders around the country to act. After years of goading by James Madison and other nationalists, delegates from twelve of the thirteen states met at the Pennsylvania state house in Philadelphia in the summer of 1787. Only Rhode Island declined to send a representative. The delegates arrived at the convention with instructions to revise the Articles of Confederation. The biggest problem the convention needed to solve was the federal government’s inability to levy taxes. That weakness meant that the burden of paying back debt from the Revolutionary War fell on the states. The states, in turn, found themselves beholden to the lenders who had bought up their war bonds. That was part of why Massachusetts had chosen to side with its wealthy bondholders over poor western farmers.5 James Madison, however, had no intention of simply revising the Articles of Confederation. He intended to produce a completely new national constitution. In the preceding year, he had completed two extensive research projects—one on the history of government in the United States, the other on the history of republics around the world. He used this research as the basis for a proposal he brought with him to Philadelphia. It came to be called the Virginia Plan, named after Madison’s home state.6 James Madison was a central figure in the reconfiguration of the national government. Madison’s Virginia Plan was a guiding document in the formation of a new government under the Constitution. John Vanderlyn, Portrait of James Madison, 1816. Wikimedia. The Virginia Plan was daring. Classical learning said that a republican form of government required a small and homogenous state: the Roman republic, or a small country like Denmark, for example. Citizens who were too far apart or too different could not govern themselves successfully. Conventional wisdom said the United States needed to have a very weak central government, which should simply represent the states on certain matters they had in common. Otherwise, power should stay at the state or local level. But Madison’s research had led him in a different direction. He believed it was possible to create “an extended republic” encompassing a diversity of people, climates, and customs. The Virginia Plan, therefore, proposed that the United States should have a strong federal government. It was to have three branches—legislative, executive, and judicial—with power to act on any issues of national concern. The legislature, or Congress, would have two houses, in which every state would be represented according to its population size or tax base. The national legislature would have veto power over state laws.7 Other delegates to the convention generally agreed with Madison that the Articles of Confederation had failed. But they did not agree on what kind of government should replace them. In particular, they disagreed about the best method of representation in the new Congress. Representation was an important issue that influenced a host of other decisions, including deciding how the national executive branch should work, what specific powers the federal government should have, and even what to do about the divisive issue of slavery. For more than a decade, each state had enjoyed a single vote in the Continental Congress. William Patterson’s New Jersey Plan proposed to keep things that way. The Connecticut delegate Roger Sherman, furthermore, argued that members of Congress should be appointed by the state legislatures. Ordinary voters, Sherman said, lacked information, were “constantly liable to be misled” and “should have as little to do as may be” about most national decisions.8 Large states, however, preferred the Virginia Plan, which would give their citizens far more power over the legislative branch. James Wilson of Pennsylvania argued that since the Virginia Plan would vastly increase the powers of the national government, representation should be drawn as directly as possible from the public. No government, he warned, “could long subsist without the confidence of the people.”9) Ultimately, Roger Sherman suggested a compromise. Congress would have a lower house, the House of Representatives, in which members were assigned according to each state’s population, and an upper house, which became the Senate, in which each state would have one vote. This proposal, after months of debate, was adopted in a slightly altered form as the Great Compromise: each state would have two senators, who could vote independently. In addition to establishing both types of representation, this compromise also counted three-fifths of a state’s enslaved population for representation and tax purposes. The delegates took even longer to decide on the form of the national executive branch. Should executive power be in the hands of a committee or a single person? How should its officeholders be chosen? On June 1, James Wilson moved that the national executive power reside in a single person. Coming only four years after the American Revolution, that proposal was extremely contentious; it conjured up images of an elected monarchy.10 The delegates also worried about how to protect the executive branch from corruption or undue control. They endlessly debated these questions, and not until early September did they decide the president would be elected by a special electoral college. In the end, the Constitutional Convention proposed a government unlike any other, combining elements copied from ancient republics and English political tradition but making some limited democratic innovations—all while trying to maintain a delicate balance between national and state sovereignty. It was a complicated and highly controversial scheme.   IV. Ratifying the Constitution Delegates to the Constitutional Convention assembled, argued, and finally agreed in this room, styled in the same manner as during the Convention. Photograph of the Assembly Room, Independence Hall, Philadelphia, Pennsylvania. Wikimedia. Creative Commons Attribution-Share Alike 3.0 Unported. The convention voted to send its proposed Constitution to Congress, which was then sitting in New York, with a cover letter from George Washington. The plan for adopting the new Constitution, however, required approval from special state ratification conventions, not just Congress. During the ratification process, critics of the Constitution organized to persuade voters in the different states to oppose it. Importantly, the Constitutional Convention had voted down a proposal from Virginia’s George Mason, the author of Virginia’s state Declaration of Rights, for a national bill of rights. This omission became a rallying point for opponents of the document. Many of these Anti-Federalists argued that without such a guarantee of specific rights, American citizens risked losing their personal liberty to the powerful federal government. The pro-ratification Federalists, on the other hand, argued that including a bill of rights was not only redundant but dangerous; it could limit future citizens from adding new rights.11 Citizens debated the merits of the Constitution in newspaper articles, letters, sermons, and coffeehouse quarrels across America. Some of the most famous, and most important, arguments came from Alexander Hamilton, John Jay, and James Madison in the Federalist Papers, which were published in various New York newspapers in 1787 and 1788.12 The first crucial vote came at the beginning of 1788 in Massachusetts. At first, the Anti-Federalists at the Massachusetts ratifying convention probably had the upper hand, but after weeks of debate, enough delegates changed their votes to narrowly approve the Constitution. But they also approved a number of proposed amendments, which were to be submitted to the first Congress. This pattern—ratifying the Constitution but attaching proposed amendments—was followed by other state conventions. The most high-profile convention was held in Richmond, Virginia, in June 1788, when Federalists like James Madison, Edmund Randolph, and John Marshall squared off against equally influential Anti-Federalists like Patrick Henry and George Mason. Virginia was America’s most populous state, it had produced some of the country’s highest-profile leaders, and the success of the new government rested upon its cooperation. After nearly a month of debate, Virginia voted 89 to 79 in favor of ratification.13 On July 2, 1788, Congress announced that a majority of states had ratified the Constitution and that the document was now in effect. Yet this did not mean the debates were over. North Carolina, New York, and Rhode Island had not completed their ratification conventions, and Anti-Federalists still argued that the Constitution would lead to tyranny. The New York convention would ratify the Constitution by just three votes, and finally Rhode Island would ratify it by two votes—a full year after George Washington was inaugurated as president.   V. Rights and Compromises Although debates continued, Washington’s election as president cemented the Constitution’s authority. By 1793, the term Anti-Federalist would be essentially meaningless. Yet the debates produced a piece of the Constitution that seems irreplaceable today. Ten amendments were added in 1791. Together, they constitute the Bill of Rights. James Madison, against his original wishes, supported these amendments as an act of political compromise and necessity. He had won election to the House of Representatives only by promising his Virginia constituents such a list of rights. There was much the Bill of Rights did not cover. Women found no special protections or guarantee of a voice in government. Many states continued to restrict voting only to men who owned significant amounts of property. And slavery not only continued to exist; it was condoned and protected by the Constitution. Of all the compromises that formed the Constitution, perhaps none would be more important than the compromise over the slave trade. Americans generally perceived the transatlantic slave trade as more violent and immoral than slavery itself. Many northerners opposed it on moral grounds. But they also understood that letting southern states import more Africans would increase their political power. The Constitution counted each enslaved individual as three fifths of a person for purposes of representation, so in districts with many enslaved people, the white voters had extra influence. On the other hand, the states of the Upper South also welcomed a ban on the Atlantic trade because they already had a surplus of enslaved laborers. Banning importation meant enslavers in Virginia and Maryland could get higher prices when they sold their enslaved laborers to states like South Carolina and Georgia that were dependent on a continued slave trade. New England and the Deep South agreed to what was called a “dirty compromise” at the Constitutional Convention in 1787. New Englanders agreed to include a constitutional provision that protected the foreign slave trade for twenty years; in exchange, South Carolina and Georgia delegates had agreed to support a constitutional clause that made it easier for Congress to pass commercial legislation. As a result, the Atlantic slave trade resumed until 1808 when it was outlawed for three reasons. First, Britain was also in the process of outlawing the slave trade in 1807, and the United States did not want to concede any moral high ground to its rival. Second, the Haitian Revolution (1791–1804), a successful slave revolt against French colonial rule in the West Indies, had changed the stakes in the debate. The image of thousands of armed Black revolutionaries terrified white Americans. Third, the Haitian Revolution had ended France’s plans to expand its presence in the Americas, so in 1803, the United States had purchased the Louisiana Territory from the French at a fire-sale price. This massive new territory, which had doubled the size of the United States, had put the question of slavery’s expansion at the top of the national agenda. Many white Americans, including President Thomas Jefferson, thought that ending the external slave trade and dispersing the domestic slave population would keep the United States a white man’s republic and perhaps even lead to the disappearance of slavery. The ban on the slave trade, however, lacked effective enforcement measures and funding. Moreover, instead of freeing illegally imported Africans, the act left their fate to the individual states, and many of those states simply sold intercepted enslaved people at auction. Thus, the ban preserved the logic of property ownership in human beings. The new federal government protected slavery as much as it expanded democratic rights and privileges for white men.14   VI. Hamilton’s Financial System Alexander Hamilton saw America’s future as a metropolitan, commercial, industrial society, in contrast to Thomas Jefferson’s nation of small farmers. While both men had the ear of President Washington, Hamilton’s vision proved most appealing and enduring. John Trumbull, Portrait of Alexander Hamilton, 1806. Wikimedia. President George Washington’s cabinet choices reflected continuing political tensions over the size and power of the federal government. The vice president was John Adams, and Washington chose Alexander Hamilton to be his secretary of the treasury. Both men wanted an active government that would promote prosperity by supporting American industry. However, Washington chose Thomas Jefferson to be his secretary of state, and Jefferson was committed to restricting federal power and preserving an economy based on agriculture. Almost from the beginning, Washington struggled to reconcile the Federalist and Republican (or Democratic-Republican) factions within his own administration.15 Alexander Hamilton believed that self-interest was the “most powerful incentive of human actions.” Self-interest drove humans to accumulate property, and that effort created commerce and industry. According to Hamilton, government had important roles to play in this process. First, the state should protect private property from theft. Second, according to Hamilton, the state should use human “passions” and “make them subservient to the public good.”16 In other words, a wise government would harness its citizens’ desire for property so that both private individuals and the state would benefit. Hamilton, like many of his contemporary statesmen, did not believe the state should ensure an equal distribution of property. Inequality was understood as “the great & fundamental distinction in Society,” and Hamilton saw no reason why this should change. Instead, Hamilton wanted to tie the economic interests of wealthy Americans, or “monied men,” to the federal government’s financial health. If the rich needed the government, then they would direct their energies to making sure it remained solvent.17 Hamilton, therefore, believed that the federal government must be “a Repository of the Rights of the wealthy.”18 As the nation’s first secretary of the treasury, he proposed an ambitious financial plan to achieve just that. The first part of Hamilton’s plan involved federal “assumption” of state debts, which were mostly left over from the Revolutionary War. The federal government would assume responsibility for the states’ unpaid debts, which totaled about $25 million. Second, Hamilton wanted Congress to create a bank—a Bank of the United States. The goal of these proposals was to link federal power and the country’s economic vitality. Under the assumption proposal, the states’ creditors (people who owned state bonds or promissory notes) would turn their old notes in to the treasury and receive new federal notes of the same face value. Hamilton foresaw that these bonds would circulate like money, acting as “an engine of business, and instrument of industry and commerce.”19 This part of his plan, however, was controversial for two reasons. First, many taxpayers objected to paying the full face value on old notes, which had fallen in market value. Often the current holders had purchased them from the original creditors for pennies on the dollar. To pay them at full face value, therefore, would mean rewarding speculators at taxpayer expense. Hamilton countered that government debts must be honored in full, or else citizens would lose all trust in the government. Second, many southerners objected that they had already paid their outstanding state debts, so federal assumption would mean forcing them to pay again for the debts of New Englanders. Nevertheless, President Washington and Congress both accepted Hamilton’s argument. By the end of 1794, 98 percent of the country’s domestic debt had been converted into new federal bonds.20 Hamilton’s plan for a Bank of the United States, similarly, won congressional approval despite strong opposition. Thomas Jefferson and other Republicans argued that the plan was unconstitutional; the Constitution did not authorize Congress to create a bank. Hamilton, however, argued that the bank was not only constitutional but also important for the country’s prosperity. The Bank of the United States would fulfill several needs. It would act as a convenient depository for federal funds. It would print paper banknotes backed by specie (gold or silver). Its agents would also help control inflation by periodically taking state bank notes to their banks of origin and demanding specie in exchange, limiting the amount of notes the state banks printed. Furthermore, it would give wealthy people a vested interest in the federal government’s finances. The government would control just 20 percent of the bank’s stock; the other eighty percent would be owned by private investors. Thus, an “intimate connexion” between the government and wealthy men would benefit both, and this connection would promote American commerce. In 1791, therefore, Congress approved a twenty-year charter for the Bank of the United States. The bank’s stocks, together with federal bonds, created over $70 million in new financial instruments. These spurred the formation of securities markets, which allowed the federal government to borrow more money and underwrote the rapid spread of state-charted banks and other private business corporations in the 1790s. For Federalists, this was one of the major purposes of the federal government. For opponents who wanted a more limited role for industry, however, or who lived on the frontier and lacked access to capital, Hamilton’s system seemed to reinforce class boundaries and give the rich inordinate power over the federal government. Hamilton’s plan, furthermore, had another highly controversial element. In order to pay what it owed on the new bonds, the federal government needed reliable sources of tax revenue. In 1791, Hamilton proposed a federal excise tax on the production, sale, and consumption of a number of goods, including whiskey.   VII. The Whiskey Rebellion and Jay’s Treaty Grain was the most valuable cash crop for many American farmers. In the West, selling grain to a local distillery for alcohol production was typically more profitable than shipping it over the Appalachians to eastern markets. Hamilton’s whiskey tax thus placed a special burden on western farmers. It seemed to divide the young republic in half—geographically between the East and West, economically between merchants and farmers, and culturally between cities and the countryside. In the fall of 1791, sixteen men in western Pennsylvania, disguised in women’s clothes, assaulted a tax collector named Robert Johnson. They tarred and feathered him, and the local deputy marshals seeking justice met similar fates. They were robbed and beaten, whipped and flogged, tarred and feathered, and tied up and left for dead. The rebel farmers also adopted other protest methods from the Revolution and Shays’s Rebellion, writing local petitions and erecting liberty poles. For the next two years, tax collections in the region dwindled. Then, in July 1794, groups of armed farmers attacked federal marshals and tax collectors, burning down at least two tax collectors’ homes. At the end of the month, an armed force of about seven thousand, led by the radical attorney David Bradford, robbed the U.S. mail and gathered about eight miles east of Pittsburgh. President Washington responded quickly. First, Washington dispatched a committee of three distinguished Pennsylvanians to meet with the rebels and try to bring about a peaceful resolution. Meanwhile, he gathered an army of thirteen thousand militiamen in Carlisle, Pennsylvania. On September 19, Washington became the only sitting president to lead troops in the field, though he quickly turned over the army to the command of Henry Lee, a Revolutionary hero and the current governor of Virginia. As the federal army moved westward, the farmers scattered. Hoping to make a dramatic display of federal authority, Alexander Hamilton oversaw the arrest and trial of a number of rebels. Many were released because of a lack of evidence, and most of those who remained, including two men sentenced to death for treason, were soon pardoned by the president. The Whiskey Rebellion had shown that the federal government was capable of quelling internal unrest. But it also demonstrated that some citizens, especially poor westerners, viewed it as their enemy.21 Around the same time, another national issue also aroused fierce protest. Along with his vision of a strong financial system, Hamilton also had a vision of a nation busily engaged in foreign trade. In his mind, that meant pursuing a friendly relationship with one nation in particular: Great Britain. America’s relationship with Britain since the end of the Revolution had been tense, partly because of warfare between the British and French. Their naval war threatened American shipping, and the impressment of men into Britain’s navy terrorized American sailors. American trade could be risky and expensive, and impressment threatened seafaring families. Nevertheless, President Washington was conscious of American weakness and was determined not to take sides. In April 1793, he officially declared that the United States would remain neutral.22 With his blessing, Hamilton’s political ally John Jay, who was currently serving as chief justice of the Supreme Court, sailed to London to negotiate a treaty that would satisfy both Britain and the United States. Jefferson and Madison strongly opposed these negotiations. They mistrusted Britain and saw the treaty as the American state favoring Britain over France. The French had recently overthrown their own monarchy, and Republicans thought the United States should be glad to have the friendship of a new revolutionary state. They also suspected that a treaty with Britain would favor northern merchants and manufacturers over the agricultural South. In November 1794, despite their misgivings, John Jay signed a “treaty of amity, commerce, and navigation” with the British. Jay’s Treaty, as it was commonly called, required Britain to abandon its military positions in the Northwest Territory (especially Fort Detroit, Fort Mackinac, and Fort Niagara) by 1796. Britain also agreed to compensate American merchants for their losses. The United States, in return, agreed to treat Britain as its most prized trade partner, which meant tacitly supporting Britain in its current conflict with France. Unfortunately, Jay had failed to secure an end to impressment.23 For Federalists, this treaty was a significant accomplishment. Jay’s Treaty gave the United States, a relatively weak power, the ability to stay officially neutral in European wars, and it preserved American prosperity by protecting trade. For Jefferson’s Republicans, however, the treaty was proof of Federalist treachery. The Federalists had sided with a monarchy against a republic, and they had submitted to British influence in American affairs without even ending impressment. In Congress, debate over the treaty transformed the Federalists and Republicans from temporary factions into two distinct (though still loosely organized) political parties.   VIII. The French Revolution and the Limits of Liberty The mounting body count of the French Revolution included that of the queen and king, who were beheaded in a public ceremony in early 1793, as depicted in the engraving. While Americans disdained the concept of monarchy, the execution of King Louis XVI was regarded by many Americans as an abomination, an indication of the chaos and savagery reigning in France at the time. Charles Monnet (artist), Antoine-Jean Duclos and Isidore-Stanislas Helman (engravers), Day of 21 January 1793 the death of Louis Capet on the Place de la Révolution, 1794. Wikimedia. In part, the Federalists were turning toward Britain because they feared the most radical forms of democratic thought. In the wake of Shays’s Rebellion, the Whiskey Rebellion, and other internal protests, Federalists sought to preserve social stability. The course of the French Revolution seemed to justify their concerns. In 1789, news had arrived in America that the French had revolted against their king. Most Americans imagined that liberty was spreading from America to Europe, carried there by the returning French heroes who had taken part in the American Revolution. Initially, nearly all Americans had praised the French Revolution. Towns all over the country hosted speeches and parades on July 14 to commemorate the day it began. Women had worn neoclassical dress to honor republican principles, and men had pinned revolutionary cockades to their hats. John Randolph, a Virginia planter, named two of his favorite horses Jacobin and Sans-Culotte after French revolutionary factions.24 In April 1793, a new French ambassador, “Citizen” Edmond-Charles Genêt, arrived in the United States. During his tour of several cities, Americans greeted him with wild enthusiasm. Citizen Genêt encouraged Americans to act against Spain, a British ally, by attacking its colonies of Florida and Louisiana. When President Washington refused, Genêt threatened to appeal to the American people directly. In response, Washington demanded that France recall its diplomat. In the meantime, however, Genêt’s faction had fallen from power in France. Knowing that a return home might cost him his head, he decided to remain in America. Genêt’s intuition was correct. A radical coalition of revolutionaries had seized power in France. They initiated a bloody purge of their enemies, the Reign of Terror. As Americans learned about Genêt’s impropriety and the mounting body count in France, many began to have second thoughts about the French Revolution. Americans who feared that the French Revolution was spiraling out of control tended to become Federalists. Those who remained hopeful about the revolution tended to become Republicans. Not deterred by the violence, Thomas Jefferson declared that he would rather see “half the earth desolated” than see the French Revolution fail. “Were there but an Adam and an Eve left in every country, and left free,” he wrote, “it would be better than as it now is.”25 Meanwhile, the Federalists sought closer ties with Britain. Despite the political rancor, in late 1796 there came one sign of hope: the United States peacefully elected a new president. For now, as Washington stepped down and executive power changed hands, the country did not descend into the anarchy that many leaders feared. The new president was John Adams, Washington’s vice president. Adams was less beloved than the old general, and he governed a deeply divided nation. The foreign crisis also presented him with a major test. In response to Jay’s Treaty, the French government authorized its vessels to attack American shipping. To resolve this, President Adams sent envoys to France in 1797. The French insulted these diplomats. Some officials, whom the Americans code-named X, Y, and Z in their correspondence, hinted that negotiations could begin only after the Americans offered a bribe. When the story became public, this XYZ Affair infuriated American citizens. Dozens of towns wrote addresses to President Adams, pledging him their support against France. Many people seemed eager for war. “Millions for defense,” toasted South Carolina representative Robert Goodloe Harper, “but not one cent for tribute.”26 By 1798, the people of Charleston watched the ocean’s horizon apprehensively because they feared the arrival of the French navy at any moment. Many people now worried that the same ships that had aided Americans during the Revolutionary War might discharge an invasion force on their shores. Some southerners were sure that this force would consist of Black troops from France’s Caribbean colonies, who would attack the southern states and cause their enslaved laborers to revolt. Many Americans also worried that France had covert agents in the country. In the streets of Charleston, armed bands of young men searched for French disorganizers. Even the little children prepared for the looming conflict by fighting with sticks.27 Meanwhile, during the crisis, New Englanders were some of the most outspoken opponents of France. In 1798, they found a new reason for Francophobia. An influential Massachusetts minister, Jedidiah Morse, announced to his congregation that the French Revolution had been hatched in a conspiracy led by a mysterious anti-Christian organization called the Illuminati. The story was a hoax, but rumors of Illuminati infiltration spread throughout New England like wildfire, adding a new dimension to the foreign threat.28 Against this backdrop of fear, the French Quasi-War, as it would come to be known, was fought on the Atlantic, mostly between French naval vessels and American merchant ships. During this crisis, however, anxiety about foreign agents ran high, and members of Congress took action to prevent internal subversion. The most controversial of these steps were the Alien and Sedition Acts. These two laws, passed in 1798, were intended to prevent French agents and sympathizers from compromising America’s resistance, but they also attacked Americans who criticized the president and the Federalist Party. The Alien Act allowed the federal government to deport foreign nationals, or “aliens,” who seemed to pose a national security threat. Even more dramatically, the Sedition Act allowed the government to prosecute anyone found to be speaking or publishing “false, scandalous, and malicious writing” against the government.29 These laws were not simply brought on by war hysteria. They reflected common assumptions about the nature of the American Revolution and the limits of liberty. In fact, most of the advocates for the Constitution and the First Amendment accepted that free speech simply meant a lack of prior censorship or restraint, not a guarantee against punishment. According to this logic, “licentious” or unruly speech made society less free, not more. James Wilson, one of the principal architects of the Constitution, argued that “every author is responsible when he attacks the security or welfare of the government.”30 In 1798, most Federalists were inclined to agree. Under the terms of the Sedition Act, they indicted and prosecuted several Republican printers—and even a Republican congressman who had criticized President Adams. Meanwhile, although the Adams administration never enforced the Alien Act, its passage was enough to convince some foreign nationals to leave the country. For the president and most other Federalists, the Alien and Sedition Acts represented a continuation of a conservative rather than radical American Revolution. However, the Alien and Sedition Acts caused a backlash in two ways. First, shocked opponents articulated a new and expansive vision for liberty. The New York lawyer Tunis Wortman, for example, demanded an “absolute independence” of the press.31 Likewise, the Virginia judge George Hay called for “any publication whatever criminal” to be exempt from legal punishment.32 Many Americans began to argue that free speech meant the ability to say virtually anything without fear of prosecution. Second, James Madison and Thomas Jefferson helped organize opposition from state governments. Ironically, both of them had expressed support for the principle behind the Sedition Act in previous years. Jefferson, for example, had written to Madison in 1789 that the nation should punish citizens for speaking “false facts” that injured the country.33 Nevertheless, both men now opposed the Alien and Sedition Acts on constitutional grounds. In 1798, Jefferson made this point in a resolution adopted by the Kentucky state legislature. A short time later, the Virginia legislature adopted a similar document written by Madison. The Kentucky and Virginia Resolutions argued that the national government’s authority was limited to the powers expressly granted by the U.S. Constitution. More importantly, they asserted that the states could declare federal laws unconstitutional. For the time being, these resolutions were simply gestures of defiance. Their bold claim, however, would have important effects in later decades. In just a few years, many Americans’ feelings toward France had changed dramatically. Far from rejoicing in the “light of freedom,” many Americans now feared the “contagion” of French-style liberty. Debates over the French Revolution in the 1790s gave Americans some of their earliest opportunities to articulate what it meant to be American. Did American national character rest on a radical and universal vision of human liberty? Or was America supposed to be essentially pious and traditional, an outgrowth of Great Britain? They couldn’t agree. It was on this cracked foundation that many conflicts of the nineteenth century would rest.   IX. Religious Freedom One reason the debates over the French Revolution became so heated was that Americans were unsure about their own religious future. The Illuminati scare of 1798 was just one manifestation of this fear. Across the United States, a slow but profound shift in attitudes toward religion and government began. In 1776, none of the American state governments observed the separation of church and state. On the contrary, all thirteen states either had established, official, and tax-supported state churches, or at least required their officeholders to profess a certain faith. Most officials believed this was necessary to protect morality and social order. Over the next six decades, however, that changed. In 1833, the final state, Massachusetts, stopped supporting an official religious denomination. Historians call that gradual process disestablishment. In many states, the process of disestablishment had started before the creation of the Constitution. South Carolina, for example, had been nominally Anglican before the Revolution, but it had dropped denominational restrictions in its 1778 constitution. Instead, it now allowed any church consisting of at least fifteen adult males to become “incorporated,” or recognized for tax purposes as a state-supported church. Churches needed only to agree to a set of basic Christian theological tenets, which were vague enough that most denominations could support them.34 South Carolina tried to balance religious freedom with the religious practice that was supposed to be necessary for social order. Officeholders were still expected to be Christians; their oaths were witnessed by God, they were compelled by their religious beliefs to tell the truth, and they were called to live according to the Bible. This list of minimal requirements came to define acceptable Christianity in many states. As new Christian denominations proliferated between 1780 and 1840, however, more and more Christians fell outside this definition. South Carolina continued its general establishment law until 1790, when a constitutional revision removed the establishment clause and religious restrictions on officeholders. Many other states, though, continued to support an established church well into the nineteenth century. The federal Constitution did not prevent this. The religious freedom clause in the Bill of Rights, during these decades, limited the federal government but not state governments. It was not until 1833 that a state supreme court decision ended Massachusetts’s support for the Congregational Church. Many political leaders, including Thomas Jefferson and James Madison, favored disestablishment because they saw the relationship between church and state as a tool of oppression. Jefferson proposed a Statute for Religious Freedom in the Virginia state assembly in 1779, but his bill failed in the overwhelmingly Anglican legislature. Madison proposed it again in 1785, and it defeated a rival bill that would have given equal revenue to all Protestant churches. Instead Virginia would not use public money to support religion. “The Religion then of every man,” Jefferson wrote, “must be left to the conviction and conscience of every man; and it is the right of every man to exercise it as these may dictate.”35 At the federal level, the delegates to the Constitutional Convention of 1787 easily agreed that the national government should not have an official religion. This principle was upheld in 1791 when the First Amendment was ratified, with its guarantee of religious liberty. The limits of federal disestablishment, however, required discussion. The federal government, for example, supported Native American missionaries and congressional chaplains. Well into the nineteenth century, debate raged over whether the postal service should operate on Sundays, and whether non-Christians could act as witnesses in federal courts. Americans continued to struggle to understand what it meant for Congress not to “establish” a religion.   X. The Election of 1800 The year 1800 brought about a host of changes in government, in particular the first successful and peaceful transfer of power from one political party to another. But the year was important for another reason: the U.S. Capitol in Washington, D.C. (pictured here in 1800) was finally opened to be occupied by Congress, the Supreme Court, the Library of Congress, and the courts of the District of Columbia. William Russell Birch, A view of the Capitol of Washington before it was burnt down by the British, c. 1800. Wikimedia. Meanwhile, the Sedition and Alien Acts expired in 1800 and 1801. They had been relatively ineffective at suppressing dissent. On the contrary, they were much more important for the loud reactions they had inspired. They had helped many Americans decide what they didn’t want from their national government. By 1800, therefore, President Adams had lost the confidence of many Americans. They had let him know it. In 1798, for instance, he had issued a national thanksgiving proclamation. Instead of enjoying a day of celebration and thankfulness, Adams and his family had been forced by rioters to flee the capital city of Philadelphia until the day was over. Conversely, his prickly independence had also put him at odds with Alexander Hamilton, the leader of his own party, who offered him little support. After four years in office, Adams found himself widely reviled. In the election of 1800, therefore, the Republicans defeated Adams in a bitter and complicated presidential race. During the election, one Federalist newspaper article predicted that a Republican victory would fill America with “murder, robbery, rape, adultery, and incest.”36 A Republican newspaper, on the other hand, flung sexual slurs against President Adams, saying he had “neither the force and firmness of a man, nor the gentleness and sensibility of a woman.” Both sides predicted disaster and possibly war if the other should win.37 In the end, the contest came down to a tie between two Republicans, Thomas Jefferson of Virginia and Aaron Burr of New York, who each had seventy-three electoral votes. (Adams had sixty-five.) Burr was supposed to be a candidate for vice president, not president, but under the Constitution’s original rules, a tie-breaking vote had to take place in the House of Representatives. It was controlled by Federalists bitter at Jefferson. House members voted dozens of times without breaking the tie. On the thirty-sixth ballot, Thomas Jefferson emerged victorious. Republicans believed they had saved the United States from grave danger. An assembly of Republicans in New York City called the election a “bloodless revolution.” They thought of their victory as a revolution in part because the Constitution (and eighteenth-century political theory) made no provision for political parties. The Republicans thought they were fighting to rescue the country from an aristocratic takeover, not just taking part in a normal constitutional process. This image attacks Jefferson’s support of the French Revolution and religious freedom. The letter, “To Mazzei,” refers to a 1796 correspondence that criticized the Federalists and, by association, President Washington. Providential Detection, 1797. Courtesy American Antiquarian Society. Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0. In his first inaugural address, however, Thomas Jefferson offered an olive branch to the Federalists. He pledged to follow the will of the American majority, whom he believed were Republicans, but to respect the rights of the Federalist minority. His election set an important precedent. Adams accepted his electoral defeat and left the White House peacefully. “The revolution of 1800,” Jefferson wrote years later, did for American principles what the Revolution of 1776 had done for its structure. But this time, the revolution was accomplished not “by the sword” but “by the rational and peaceable instrument of reform, the suffrage of the people.”38 Four years later, when the Twelfth Amendment changed the rules for presidential elections to prevent future deadlocks, it was designed to accommodate the way political parties worked. Despite Adams’s and Jefferson’s attempts to tame party politics, though, the tension between federal power and the liberties of states and individuals would exist long into the nineteenth century. And while Jefferson’s administration attempted to decrease federal influence, Chief Justice John Marshall, an Adams appointee, worked to increase the authority of the Supreme Court. These competing agendas clashed most famously in the 1803 case of Marbury v. Madison, which Marshall used to establish a major precedent. The Marbury case seemed insignificant at first. The night before leaving office in early 1801, Adams had appointed several men to serve as justices of the peace in Washington, D.C. By making these “midnight appointments,” Adams had sought to put Federalists into vacant positions at the last minute. On taking office, however, Jefferson and his secretary of state, James Madison, had refused to deliver the federal commissions to the men Adams had appointed. Several of the appointees, including William Marbury, sued the government, and the case was argued before the Supreme Court. Marshall used Marbury’s case to make a clever ruling. On the issue of the commissions, the Supreme Court ruled in favor of the Jefferson administration. But Chief Justice Marshall went further in his decision, ruling that the Supreme Court reserved the right to decide whether an act of Congress violated the Constitution. In other words, the court assumed the power of judicial review. This was a major (and lasting) blow to the Republican agenda, especially after 1810, when the Supreme Court extended judicial review to state laws. Jefferson was particularly frustrated by the decision, arguing that the power of judicial review “would make the Judiciary a despotic branch.”39   XI. Conclusion A grand debate over political power engulfed the young United States. The Constitution ensured that there would be a strong federal government capable of taxing, waging war, and making law, but it could never resolve the young nation’s many conflicting constituencies. The Whiskey Rebellion proved that the nation could stifle internal dissent but exposed a new threat to liberty. Hamilton’s banking system provided the nation with credit but also constrained frontier farmers. The Constitution’s guarantee of religious liberty conflicted with many popular prerogatives. Dissension only deepened, and as the 1790s progressed, Americans became bitterly divided over political parties and foreign war. During the ratification debates, Alexander Hamilton had written of the wonders of the Constitution. “A nation, without a national government,” he wrote, would be “an awful spectacle.” But, he added, “the establishment of a Constitution, in time of profound peace, by the voluntary consent of a whole people, is a prodigy,” a miracle that should be witnessed “with trembling anxiety.”40 Anti-Federalists had grave concerns about the Constitution, but even they could celebrate the idea of national unity. By 1795, even the staunchest critics would have grudgingly agreed with Hamilton’s convictions about the Constitution. Yet these same individuals could also take the cautions in Washington’s 1796 farewell address to heart. “There is an opinion,” Washington wrote, “that parties in free countries are useful checks upon the administration of the government and serve to keep alive the spirit of liberty.” This, he conceded, was probably true, but in a republic, he said, the danger was not too little partisanship, but too much. “A fire not to be quenched,” Washington warned, “it demands a uniform vigilance to prevent its bursting into a flame, lest, instead of warming, it should consume.”41 For every parade, thanksgiving proclamation, or grand procession honoring the unity of the nation, there was also some political controversy reminding American citizens of how fragile their union was. And as party differences and regional quarrels tested the federal government, the new nation increasingly explored the limits of its democracy.   XII. Primary Sources 1. Hector St. Jean de Crèvecœur describes the American people, 1782 Hector St. John de Crèvecœur was born in France, but relocated to the colony of New York and married a local woman named Mehitable Tippet. For a period of several years, de Crèvecœur wrote about the people he encountered in North America. The resulting work was widely successful in Europe. In this passage, Crèvecœur attempts to reflect on the difference between life in Europe and life in North America. 2. A Confederation of Native peoples seek peace with the United States, 1786 In 1786, half a year before the Constitutional Convention, a collection of Native American leaders gathered on the banks of the Detroit River to offer a unified message to the Congress of the United States. Despite this proposal, American surveyors, settlers, and others continued to cross the Ohio River. 3. Mary Smith Cranch comments on politics, 1786-87 In the aftermath of the Revolution, politics became a sport consumed by both men and women. In a series of letters sent to her sister, Mary Smith Cranch comments on a series of political events including the lack of support for diplomats, the circulation of paper or hard currency, legal reform, tariffs against imported tea tables, Shays’s rebellion, and the role of women in supporting the nation’s interests. 4. James Madison, Memorial and Remonstrance Against Religious Assessments, 1785 Before the American Revolution, Virginia supported local Anglican churches through taxes. After the American Revolution, Virginia had to decide what to do with this policy. Some founding fathers, including Patrick Henry, wanted to equally distribute tax dollars to all churches. In this document, James Madison explains why he did not want any government money to support religious causes in Virginia. 5. George Washington, “Farewell Address,” 1796 George Washington used his final public address as president to warn against what he understood as the two greatest dangers to American prosperity: political parties and foreign wars. Washington urged the American people to avoid political partisanship and entanglements with European wars.  6. Venture Smith, A Narrative of the Life and Adventures of Venture Smith, 1798 Venture Smith’s autobiography is one of the earliest slave narratives to circulate in the Atlantic World. Slave narratives grew into the most important genre of antislavery literature and bore testimony to the injustices of the slave system. Smith was unusually lucky in that he was able to purchase his freedom, but his story nonetheless reveals the hardships faced by even the most fortunate enslaved men and women. 7. Susannah Rowson, Charlotte Temple, 1794 In Charlotte Temple, the first novel written in America, Susannah Rowson offered a cautionary tale of a woman deceived and then abandoned by a roguish man. Americans throughout the new nation read the book with rapt attention and many even traveled to New York City to visit the supposed grave of this fictional character. 8. Constitutional ratification cartoon, 1789 The Massachusetts Centinel ran a series of cartoons depicting the ratification of the Constitution.  Each vertical pillar represents a state that has ratified the new government.  In this cartoon, North Carolina’s pillar is being guided into place (it would vote for ratification in November 1789).  Rhode Island’s pillar, however, is crumbling and shows the uncertainty of the vote there.    9. Anti-Thomas Jefferson Cartoon, 1797 This image attacks Jefferson’s support of the French Revolution and religious freedom.  The Altar to “Gallic Despotism” mocks Jefferson’s allegiance to the French. The letter, “To Mazzei,” refers to a 1796 correspondence that criticized the Federalists and, by association, President Washington.    XIII. Reference Material This chapter was edited by Tara Strauch, with content contributions by Marco Basile, Nathaniel C. Green, Brenden Kennedy, Spencer McBride, Andrea Nero, Cara Rogers, Tara Strauch, Michael Harrison Taylor, Jordan Taylor, Kevin Wisniewski, and Ben Wright. Recommended citation: Marco Basile et al., “A New Nation,” Tara Strauch, ed., in The American Yawp, eds. Joseph Locke and Ben Wright (Stanford, CA: Stanford University Press, 2018).   Recommended Reading Allgor, Catherine. Parlor Politics: In which the Ladies of Washington Help Build a City and a Government. Charlottesville: University of Virginia Press, 2000. Appleby, Joyce. Inheriting the Revolution: The First Generation of Americans. Cambridge, Mass.: Belknap Press, 2001. Bartolini-Tuazon, Kathleen. For Fear of an Elective King: George Washington and the Presidential Title Controversy of 1789. Ithaca: Cornell University Press, 2014. Beeman, Richard, Stephen Botein, and Edward C. Carter II eds. Beyond Confederation: Origins of the Constitution and American National Identity. Chapel Hill, N.C.: University of North Carolina Press, 1987. Bilder, Mary Sarah. Madison’s Hand: Revising the Constitutional Convention. Cambridge: Harvard University Press, 2015. Bouton, Terry. “A Road Closed: Rural Insurgency in Post-Independence Pennsylvania,” Journal of American History 87:3 (December 2000): 855-887. Cunningham, Noble E. The Jeffersonian Republicans: The Formation of Party Organization, 1789-1801. Chapel Hill, N.C.: University of North Carolina Press, 1967. Dunn, Susan. Jefferson’s Second Revolution: The Election of 1800 and the Triumph of Republicanism. Boston: Houghton Mifflin, 2004. Edling, Max. A Revolution in Favor of Government: Origins of the U.S. Constitution and the Making of the American State. New York: Oxford University Press, 2003 Gordon-Reed, Annette. The Hemingses of Monticello: An American Family. New York: W. W. Norton, 2008. Halperin, Terri Diane. The Alien and Sedition Acts of 1798: Testing the Constitution. Baltimore: Johns Hopkins University Press, 2016. Holton, Woody. Unruly Americans and the Origins of the Constitution. 1st edition. New York: Hill and Wang, 2007. Kierner, Cynthia A. Martha Jefferson Randolph, Daughter of Monticello: Her Life and Times. Chapel Hill: University of North Carolina Press, 2012. Maier, Pauline. Ratification: The People Debate the Constitution, 1787-1788. New York: Simon & Schuster, 2010. Papenfuse, Eric Robert. “Unleashing the ‘Wildness’: The Mobilization of Grassroots Antifederalism in Maryland,” Journal of the Early Republic 16:1 (Spring 1996): 73-106. Pasley, Jeffrey L. The First Presidential Contest: 1796 and the Founding of American Democracy. Lawrence: The University of Kansas Press, 2013. Smith-Rosenberg, Carroll. “Dis-Covering the Subject of the ‘Great Constitutional Discussion,’ 1786-1789,” Journal of American History 79:3 (December 1992): 841-873 Taylor, Alan. William Cooper’s Town: Power and Persuasion on the Frontier of the Early American Republic. Reprint edition. New York: Vintage, 1996. Rakove, Jack N. Original Meanings: Politics and Ideas in the Making of the Constitution. New York: Vintage Books, 1996. Salmon, Marylynn. Women and the Law of Property in Early America. Chapel Hill, N.C.: University of North Carolina Press, 1989. Sharp, James Roger. American Politics in the Early Republic: The New Nation in Crisis. New Haven: Yale University Press, 1993. Slaughter, Thomas P. The Whiskey Rebellion: Frontier Epilogue to the American Revolution. New York: Oxford University Press, 1988. Waldstreicher, David. In the Midst of Perpetual Fetes : The Making of American Nationalism, 1776-1820. Chapel Hill : Williamsburg, Virginia, by the University of North Carolina Press, 1997. Wood, Gordon. Empire of Liberty: A History of the Early Republic, 1789-1815. Oxford: Oxford University Press, 2011. Zagarri, Rosemarie. Revolutionary Backlash: Women and Politics in the Early American Republic. Philadelphia: University of Pennsylvania Press, 2007. Allgor, Catherine. Parlor Politics: In Which the Ladies of Washington Help Build a City and a Government. Charlottesville: University of Virginia Press, 2000. Appleby, Joyce. Inheriting the Revolution: The First Generation of Americans. Cambridge, MA: Belknap Press, 2001. Bartolini-Tuazon, Kathleen. For Fear of an Elective King: George Washington and the Presidential Title Controversy of 1789. Ithaca, NY: Cornell University Press, 2014. Beeman, Richard, Stephen Botein, and Edward C. Carter II, eds. Beyond Confederation: Origins of the Constitution and American National Identity. Chapel Hill: University of North Carolina Press, 1987. Bilder, Mary Sarah. Madison’s Hand: Revising the Constitutional Convention. Cambridge, MA: Harvard University Press, 2015. Bouton, Terry. “A Road Closed: Rural Insurgency in Post-Independence Pennsylvania.” Journal of American History 87, no. 3 (December 2000): 855–887. Cunningham, Noble E. The Jeffersonian Republicans: The Formation of Party Organization, 1789–1801. Chapel Hill: University of North Carolina Press, 1967. Dunn, Susan. Jefferson’s Second Revolution: The Election of 1800 and the Triumph of Republicanism. Boston: Houghton Mifflin, 2004. Edling, Max. A Revolution in Favor of Government: Origins of the U.S. Constitution and the Making of the American State. New York: Oxford University Press, 2003. Gordon-Reed, Annette. The Hemingses of Monticello: An American Family. New York: Norton, 2008. Halperin, Terri Diane. The Alien and Sedition Acts of 1798: Testing the Constitution. Baltimore: Johns Hopkins University Press, 2016. Holton, Woody. Unruly Americans and the Origins of the Constitution. New York: Hill and Wang, 2007. Kierner, Cynthia A. Martha Jefferson Randolph, Daughter of Monticello: Her Life and Times. Chapel Hill: University of North Carolina Press, 2012. Maier, Pauline. Ratification: The People Debate the Constitution, 1787–1788. New York: Simon and Schuster, 2010. Papenfuse, Eric Robert. “Unleashing the ‘Wildness’: The Mobilization of Grassroots Antifederalism in Maryland.” Journal of the Early Republic 16, no. 1 (Spring 1996): 73–106. Pasley, Jeffrey L. The First Presidential Contest: 1796 and the Founding of American Democracy. Lawrence: University of Kansas Press, 2013. Rakove, Jack N. Original Meanings: Politics and Ideas in the Making of the Constitution. New York: Vintage Books, 1996. Salmon, Marylynn. Women and the Law of Property in Early America. Chapel Hill: University of North Carolina Press, 1989. Sharp, James Roger. American Politics in the Early Republic: The New Nation in Crisis. New Haven, CT: Yale University Press, 1993. Slaughter, Thomas P. The Whiskey Rebellion: Frontier Epilogue to the American Revolution. New York: Oxford University Press, 1986. Smith-Rosenberg, Carroll. “Dis-Covering the Subject of the ‘Great Constitutional Discussion,’ 1786–1789.” Journal of American History 79, no. 3 (December 1992): 841–873. Taylor, Alan. William Cooper’s Town: Power and Persuasion on the Frontier of the Early American Republic. New York: Vintage, 1996. Waldstreicher, David. In the Midst of Perpetual Fetes : The Making of American Nationalism, 1776–1820. Chapel Hill : University of North Carolina Press, 1997. Wood, Gordon. Empire of Liberty: A History of the Early Republic, 1789–1815. Oxford, UK: Oxford University Press, 2011. Zagarri, Rosemarie. Revolutionary Backlash: Women and Politics in the Early American Republic. Philadelphia: University of Pennsylvania Press, 2007   Notes Francis Hopkinson, An Account of the Grand Federal Procession, Philadelphia, July 4, 1788 (Philadelphia: Carey, 1788). []George Washington, Thanksgiving Proclamation, October, 3, 1789; Fed. Reg., Presidential Proclamations, 1791–1991. []Hampshire Gazette (CT), September 13, 1786. []James Madison, The Federalist Papers, (New York: Signet Classics, 2003), no. 63. []Woody Holton, Unruly Americans and the Origins of the Constitution (New York: Hill and Wang, 2007), 8–9. []Madison took an active role during the convention. He also did more than anyone else to shape historians’ understandings of the convention by taking meticulous notes. Many of the quotes included here come from Madison’s notes. To learn more about this important document, read Mary Sarah Bilder, Madison’s Hand: Revising the Constitutional Convention (Cambridge, MA: Harvard University Press, 2015). []Virginia (Randolph) Plan as Amended (National Archives Microfilm Publication M866, 1 roll); The Official Records of the Constitutional Convention; Records of the Continental and Confederation Congresses and the Constitutional Convention, 1774–1789, Record Group 360; National Archives. []Richard Beeman, Plain, Honest Men: The Making of the American Constitution (New York: Random House, 2009), 114. []Herbert J. Storing, What the Anti-Federalists Were For: The Political Thought of the Opponents of the Constitution (Chicago: University of Chicago Press, 1981), 16. []Ray Raphael, Mr. President: How and Why the Founders Created a Chief Executive (New York: Knopf, 2012), 50. See also Kathleen Bartoloni-Tuazon, For Fear of an Elected King: George Washington and the Presidential Title Controversy of 1789 (Ithaca, NY: Cornell University Press, 2014). []David J. Siemers, Ratifying the Republic: Antifederalists and Federalists in Constitutional Time (Stanford, CA: Stanford University Press, 2002). []Alexander Hamilton, James Madison, and John Jay, The Federalist Papers, ed. Ian Shapiro (New Haven, CT: Yale University Press, 2009). []Pauline Maier, Ratification: The People Debate the Constitution, 1787–1788 (New York: Simon and Schuster, 2010), 225–237. []David Waldstreicher, Slavery’s Constitution: From Revolution to Ratification (New York: Hill and Wang, 2009). []Carson Holloway, Hamilton Versus Jefferson in the Washington Administration: Completing the Founding or Betraying the Founding? (New York: Cambridge University Press, 2015). []Alexander Hamilton, The Works of Alexander Hamilton, Volume 1, ed. Henry Cabot Lodge, ed. (New York: Putnam, 1904), 70, 408. []Alexander Hamilton, Report on Manufactures (New York: Childs and Swaine, 1791). []James H. Hutson, ed., Supplement to Max Farrand’s the Records of the Federal Convention of 1787 (New Haven, CT: Yale University Press, 1987), 119. []Hamilton, Report on Manufactures). []Richard Sylla, “National Foundations: Public Credit, the National Bank, and Securities Markets,” in Founding Choices: American Economic Policy in the 1790s, ed. Douglas A. Irwin and Richard Sylla (Chicago: University of Chicago Press, 2011), 68. []Thomas P. Slaughter, The Whiskey Rebellion: Frontier Epilogue to the American Revolution (New York: Oxford University Press, 1986). []“Proclamation of Neutrality, 1793,” in A Compilation of the Messages and Papers of the Presidents Prepared Under the Direction of the Joint Committee on printing, of the House and Senate Pursuant to an Act of the Fifty-Second Congress of the United States (New York: Bureau of National Literature, 1897). []United States, Treaty of Amity, Commerce, and Navigation, signed at London November 19, 1794, Submitted to the Senate June 8, Resolution of Advice and Consent, on condition, June 24, 1795. Ratified by the United States August 14, 1795. Ratified by Great Britain October 28, 1795. Ratifications exchanged at London October 28, 1795. Proclaimed February 29, 1796. []Elizabeth Fox-Genovese and Eugene D. Genovese, The Mind of the Master Class: History and Faith in the Southern Slaveholders Worldview (New York: Cambridge University Press, 2005), 18. []From Thomas Jefferson to William Short, 3 January 1793,” Founders Online, National Archives. http://founders.archives.gov/documents/Jefferson/01-25-02-0016, last modified June 29, 2015; The Papers of Thomas Jefferson, vol. 25, 1 January–10 May 1793, ed. John Catanzariti (Princeton, NJ: Princeton University Press, 1992), 14–17. []Robert Goodloe Harper, June 18, 1798, quoted in American Daily Advertiser (Philadelphia), June 20, 1798. []Robert J. Alderson Jr., This Bright Era of Happy Revolutions: French Consul Michel-Ange-Bernard Mangourit and International Republicanism in Charleston, 1792–1794 (Columbia: University of South Carolina Press, 2008). []Rachel Hope Cleves, The Reign of Terror in America: Visions of Violence from Anti-Jacobinism to Antislavery (New York: Cambridge University Press, 2012), 47. []Alien Act, July 6, 1798, and An Act in Addition to the Act, Entitled “An Act for the Punishment of Certain Crimes Against the United States,” July 14, 1798; Fifth Congress; Enrolled Acts and Resolutions; General Records of the United States Government; Record Group 11; National Archives. []James Wilson, Congressional Debate, December 1, 1787, in Jonathan Elliot, ed., The Debates in the Several State Conventions on the Adoption of the Federal Constitution as Recommended by the General Convention at Philadelphia in 1787, Vol. 2 (New York: s.n., 1888) 448–450. []Tunis Wortman, A Treatise Concerning Political Enquiry, and the Liberty of the Press (New York: Forman, 1800), 181. []George Hay, An Essay on the Liberty of the Press (Philadelphia: s.n., 1799), 43. []Thomas Jefferson to James Madison, August 28, 1789, from The Works of Thomas Jefferson in Twelve Volumes, Federal Edition, ed. Paul Leicester Ford. http://www.loc.gov/resource/mtj1.011_0853_0861 []Francis Newton Thorpe, ed., The Federal and State Constitutions, Colonial Charters, and Other Organic Laws of the States, Territories, and Colonies Now or Heretofore Forming the United States of America Compiled and Edited Under the Act of Congress of June 30, 1906 (Washington, DC: U.S. Government Printing Office, 1909). []Thomas Jefferson, An Act for Establishing Religious Freedom, 16 January 1786, Manuscript, Records of the General Assembly, Enrolled Bills, Record Group 78, Library of Virginia. []Catherine Allgor, Parlor Politics: In Which the Ladies of Washington Help Build a City and a Government (Charlottesville: University of Virginia Press, 2000), 14. []James T. Callender, The Prospect Before Us (Richmond: s.n., 1800). []Letter from Thomas Jefferson to Spencer Roane, September 6, 1819, in The Writings of Thomas Jefferson, 20 vols., ed. Albert Ellery Bergh (Washington, DC: Thomas Jefferson Memorial Association of the United States, 1903), 142. []Harold H. Bruff, Untrodden Ground: How Presidents Interpret the Constitution (Chicago: University of Chicago Press, 2015), 65. []Alexander Hamilton, The Federalist Papers (New York: Signet Classics, 2003), no. 85. []George Washington, Farewell Address, Annals of Congress, 4th Congress, 2869–2870. [] This entry was posted in Uncategorized on June 7, 2013 by All Chapters. Post navigation ← 5. The American Revolution 7. The Early Republic →

      The discussion of Shays’s Rebellion reveals how economic struggles and weak national power under the Articles of Confederation created serious unrest among farmers. While some leaders viewed the rebellion as a dangerous threat to order, others believed it represented the same revolutionary spirit that founded the country.

    1. In 1847, when the American Medical Association (AMA) published their First Code of Medical Ethics, they agreed, stating that “The life of a sick person can be shortened not only by the acts, but also by the words or manner of a physician. It is, therefore, a sacred duty to guard himself carefully in this respect, and to avoid all things which have a tendency to discourage the patient and depress his spirits.”

      .

    1. Dossier de Synthèse : La Psychologie de l'Engagement

      Résumé Exécutif

      Ce document synthétise les concepts clés de la psychologie de l'engagement, tels que présentés par le professeur Fabien Girandola.

      La thèse centrale est que la persuasion traditionnelle, basée sur l'information et l'argumentation, est largement inefficace pour modifier durablement les comportements.

      En opposition, la théorie de l'engagement propose une approche contre-intuitive mais puissante :

      • amener les individus à réaliser un premier acte, peu coûteux et en situation de libre choix, pour les lier à cet acte et
      • les inciter à adopter des comportements plus significatifs par la suite.

      Des techniques comme le "pied-dans-la-porte" et "l'étiquetage", validées par des décennies de recherche expérimentale, démontrent qu'il est possible d'influencer les actions en structurant la situation plutôt qu'en tentant de convaincre les esprits.

      Un effet psychologique majeur de ces techniques est la "naturalisation" : les individus attribuent leur nouveau comportement à leur propre nature ("je suis altruiste") sans avoir conscience de la manipulation situationnelle qui en est la véritable cause.

      La maîtrise de ces techniques soulève des questions éthiques fondamentales, naviguant entre l'influence et la manipulation.

      1. L'Inefficacité de la Persuasion : Le Fossé entre Opinion et Comportement

      La démarche classique pour changer les comportements repose sur la persuasion : l'idée qu'en fournissant des informations et des arguments convaincants, on peut modifier les opinions des individus, ce qui entraînera une modification de leurs actions.

      1.1. Le Postulat de la Persuasion

      L'approche persuasive suppose une chaîne causale directe :

      1. Information : Présenter des faits (ex: "Le tabac tue").

      2. Conviction : L'individu intègre l'information et modifie son opinion.

      3. Action : L'individu ajuste son comportement pour qu'il soit cohérent avec sa nouvelle opinion.

      1.2. La Démonstration de l'Échec

      Des décennies de recherche en psychologie sociale, depuis les années 1960, montrent que ce lien est faible, voire inexistant.

      Savoir quelque chose ne garantit pas de se conformer à cette connaissance.

      Exemples courants :

      ◦ Les fumeurs savent que le tabac est nocif mais continuent de fumer.   

      ◦ La majorité des gens s'accordent sur l'importance de l'écologie mais n'adoptent que peu de comportements pro-environnementaux.

      L'Expérimentation de Bigman (1972) : Cette étude princeps illustre parfaitement le décalage entre l'opinion déclarée et le comportement réel.

      Phase de l'Expérience

      Résultat

      Sondage d'opinion

      95 % des passants déclarent qu'il est important de garder les rues propres.

      Mise en situation

      Confrontés à un papier à ramasser dans la rue, seulement 2 % des mêmes personnes effectuent le geste.

      Cette expérience fondatrice démontre que l'adhésion à une idée (la propreté) ne se traduit pas automatiquement en action.

      2. La Théorie de l'Engagement : Agir d'Abord, Penser Ensuite

      Face aux limites de la persuasion, la théorie de l'engagement, développée notamment par des chercheurs comme Kiesler, Jean-Léon Beauvois et Robert-Vincent Joule, propose de renverser la logique.

      Au lieu de viser les opinions pour changer les actes, elle vise les actes pour, par la suite, influencer les opinions et les comportements futurs.

      2.1. Définition et Principes

      Définition (Kiesler, 1971) : L'engagement est "le lien qui unit l'individu à son acte".

      Principe fondamental : Ce n'est pas l'individu qui s'engage de lui-même, mais la situation qui l'engage.

      L'objectif est d'amener une personne à réaliser de petits actes progressifs qui l'entraîneront vers des comportements plus coûteux qu'elle n'aurait pas réalisés spontanément.

      2.2. Les Facteurs Clés de l'Engagement

      Pour qu'une situation soit engageante, plusieurs facteurs doivent être réunis.

      Facteur

      Description

      Exemple

      Le Sentiment de Liberté

      C'est le facteur le plus crucial. L'individu doit avoir l'impression qu'il a choisi librement de réaliser l'acte.

      Les formules comme "Vous êtes libre d'accepter ou de refuser" ou "Faites comme vous voulez" augmentent considérablement le taux d'acceptation, car elles créent un sentiment de liberté, même si celui-ci est contextuellement contraint.

      Demander de signer une pétition en ajoutant "mais vous êtes libre de refuser" fait passer le taux d'acceptation de 15 % à 45 %.

      Le Caractère Public

      Un acte réalisé publiquement (signer une pétition, prendre la parole) est plus engageant qu'un acte privé.

      Le nom et la signature laissés lient l'individu à son action.

      Signer une pétition avec son nom complet.

      La Répétition de l'Acte

      Répéter un comportement renforce le lien d'engagement.

      Après avoir prêté un objet plusieurs fois, il devient difficile de refuser.

      Prêter un outil à un voisin chaque semaine.

      Le Coût de l'Acte

      Un acte qui demande un effort ou un sacrifice (en temps, en argent, en énergie) est plus engageant.

      Prêter sa voiture est plus engageant que de prêter un stylo.

      L'Étiquetage (Imputation Interne)

      Attribuer une qualité à une personne ("Je sais que vous êtes serviable") l'engage à se comporter conformément à cette étiquette.

      L'acte semble alors "naturel" pour l'individu.

      Dire à quelqu'un "Vous êtes vraiment quelqu'un de bien".

      Note importante : L'engagement ne fonctionne pas en présence de récompenses ou de punitions.

      Si une personne est payée ou menacée pour faire quelque chose, l'acte n'est pas attribué à une décision interne mais à la contrainte externe.

      Il n'y a donc pas d'engagement psychologique.

      3. Les Techniques de Soumission Librement Consentie

      Ces principes théoriques ont été déclinés en techniques d'induction comportementale concrètes, regroupées sous le nom paradoxal de "soumission librement consentie" :

      l'individu se soumet à une demande tout en ayant le sentiment d'avoir agi librement.

      3.1. Le Pied-dans-la-Porte : Demander Peu pour Obtenir Plus

      C'est la technique la plus connue.

      Elle consiste à faire accepter une première requête très peu coûteuse (l'acte préparatoire) pour augmenter significativement les chances que la personne accepte une seconde requête, beaucoup plus coûteuse (le comportement visé).

      Expérimentation de Freedman & Fraser (1966) - Scénario 1 : L'enquête à domicile

      Condition Expérimentale

      Requête

      Taux d'Acceptation

      Contrôle

      Demande directe : Accepter la visite de 2-3h d'une équipe d'enquêteurs pour fouiller la maison.

      22 %

      Pied-dans-la-porte

      1. Acte préparatoire : Répondre à un court questionnaire téléphonique (accepté par tous).<br>

      2. Requête finale (3 jours plus tard) : Accepter la visite de l'équipe d'enquêteurs.

      53 %

      Expérimentation de Freedman & Fraser (1966) - Scénario 2 : Le panneau dans le jardin

      Condition Expérimentale

      Requête

      Taux d'Acceptation

      Contrôle

      Demande directe : Planter un grand panneau de 4x4m pour la sécurité routière dans son jardin.

      17 %

      Pied-dans-la-porte

      1. Acte préparatoire : Apposer un petit autocollant pour la prévention routière sur sa vitre (accepté par tous).<br>

      2. Requête finale (3 jours plus tard) : Accepter de planter le grand panneau.

      76 %

      3.2. L'Étiquetage et le Pied-dans-la-Porte Implicite

      Cette approche combine l'acte préparatoire avec une valorisation de la personne, l'incitant à réaliser d'elle-même un comportement coûteux, sans qu'on le lui demande explicitement.

      Expérimentation de Joule et al. (2002) - Le billet perdu à Aix-en-Provence

      Le comportement visé est l'altruisme : rendre un billet de 10 € tombé de la poche d'un complice.

      L'acte préparatoire consiste à renseigner un "touriste" (un autre complice) sur un plan.

      La variable clé est la manière dont le touriste remercie la personne.

      Condition

      Réponse du "Touriste" après avoir été aidé

      Taux de Restitution du Billet

      Contrôle

      Pas d'interaction préalable avec le touriste.

      30 %

      Pied-dans-la-porte (Remerciement simple)

      "Merci."

      43 %

      Pied-dans-la-porte (Service)

      "Vous m'avez rendu un grand service."

      48 %

      Pied-dans-la-porte + Étiquetage 1

      "Vous êtes serviable."

      70 %

      Pied-dans-la-porte + Étiquetage 2

      "Vous êtes vraiment quelqu'un de bien."

      78 %

      Cette expérience démontre que l'on peut faire varier le taux d'altruisme de 30 % à 78 % uniquement en modifiant une interaction anodine quelques minutes auparavant.

      4. Conséquences Psychologiques et Éthiques

      4.1. La Naturalisation du Comportement

      L'effet le plus remarquable de l'engagement est que les individus n'ont pas conscience d'avoir été influencés. Interrogés sur les raisons de leur acte (ex: rendre le billet), ils répondent systématiquement :

      "C'est normal, je suis quelqu'un d'altruiste/généreux".

      Signification vs. Détermination :

      Signification : L'explication que l'individu donne à son comportement (interne, liée à sa personnalité).  

      Détermination : La cause réelle du comportement (externe, liée à la situation créée par l'expérimentateur).

      Les individus n'ont pas accès à la véritable détermination de leurs actes et la remplacent par une signification qui valorise leur "moi".

      4.2. La Frontière avec la Manipulation

      Le professeur Girandola insiste sur le fait que ces techniques sont puissantes et naviguent à la frontière de la manipulation.

      Leur connaissance est essentielle non seulement pour les utiliser à bon escient (santé publique, éducation) mais aussi pour s'en prémunir.

      Il rappelle que l'usage de ces techniques par les psychologues est encadré par un code de déontologie strict : "il n'y a pas d'action sans éthique".

      5. Lectures et Ressources Recommandées

      Pour approfondir le sujet, plusieurs ouvrages et articles ont été mentionnés :

      Ouvrages de référence :

      Petit traité de manipulation à l'usage des honnêtes gens par R.-V. Joule et J.-L. Beauvois.  

      La soumission librement consentie par les mêmes auteurs.    ◦ Psychologie sociale et Attitude et comportement par F. Girandola.

      Articles en ligne :

      ◦ Des articles de vulgarisation sur la plateforme The Conversation, notamment sur l'application des techniques de manipulation par Donald Trump ou dans le contexte des soldes.

      Vidéo :

      ◦ La reconstitution filmée de l'expérience du "billet perdu" est disponible en ligne.

      6. Conclusion et Perspectives

      La présentation s'est concentrée sur les fondements de la théorie de l'engagement et la technique du pied-dans-la-porte.

      Il a été précisé que d'autres aspects importants n'ont pas été abordés, notamment :

      • Les effets de l'engagement sur les opinions (via la théorie de la dissonance cognitive).

      L'escalade d'engagement, un processus où un individu persévère dans une décision ou un comportement qui s'avère négatif, simplement parce qu'il s'y est initialement engagé.

    1. water_well” [out:json][timeout:60]; // Berliner Stadtgrenze (Relation) {{geocodeArea:Berlin}}->.searchArea; // Suche nach Wasserpumpen innerhalb Berlins ( node["man_made"="water_well"](area.searchArea); way["man_made"="water_well"](area.searchArea); relation["man_made"="water_well"](area.searchArea); ); out body; >; out skel qt; Copy to clipboard wurde eine umfangreiche Sammlung relevanter Pumpenstandorte generiert.

      Ich würde den Satz hier nicht unterbrechen mit einen Code snippet sondern zu Ende führen und dann den Code snippet einfügen und mit den Text danach weiter machen.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public reviews:

      (1) Stable annual dynamics vs. episodic outbreaks

      We agree that RVF is classically described as producing periodic epidemics interspersed with long inter-epidemic periods, often linked to extreme rainfall events. Our model predicts more regular seasonal dynamics, which reflects the endemic transmission patterns we have observed in The Gambia through serological surveys. In this revision, we have:

      - clarified that while epidemics occur in other parts of sub-Saharan Africa, our results are consistent with the epidemiological narrative of RVF in The Gambia, characterised by sustained, moderate transmission without resulting in substantial outbreaks (hyperendemicity).

      - discussed how model assumptions (e.g. seasonality, homogenous mixing) may bias our results toward an endemic quasi-equilibrium dynamic.

      - highlighted the implications of this for interpretation and for public health decision-making.

      (2) Use of network analysis

      We acknowledge the reviewer’s concern. The network analysis was conducted descriptively to characterize cattle movement patterns and the structure of herd connections, but it was not formally incorporated into the model. In this revision we have:

      - clarified this distinction in the manuscript to avoid overinterpretation.

      - emphasized the need for future modelling work using finer-scale movement data, which could support more realistic herd metapopulation dynamics and better capture heterogeneity in transmission.

      (3) RVFV reproductive impacts

      While RVF outbreaks are known to cause substantial abortions and neonatal deaths, these events occur during sporadic epidemics. In the Gambian context, where we’re not observing large outbreaks but rather low-level circulation, the annual impact of RVF infection on births is likely modest compared to baseline herd turnover. Moreover, cattle demography is partly managed, with replacement and movement buffering birth rates against short-term losses.

      Our model includes birth as a constant demographic process, it’s reasonable to assume stable population since we are not explicitly modelling outbreak-scale reproductive losses. This approach is consistent with other RVF transmission models that adopt a similar simplifying assumption. However, we have acknowledged this simplification as a limitation in the revised manuscript.

      (4) Missing ODEs for M herds in the dry season

      We thank the reviewer for identifying this omission. The ODEs for the M subpopulation in the dry season were not included in the appendix due to an oversight, though demographic turnover was implemented in the model code. We have now added the missing equations to the appendix.

      (5) Role of immunity loss and model structure (SIR vs. SIRS)

      We acknowledge that the decline of detectable antibodies over time (seropositivity decay) is an important consideration in RVFV serology; however, whether this decline reflects a true loss of protective immunity following natural infection remains unknown. Available evidence suggests that infected cattle likely develop long-lasting immunity, and findings in humans further support this assumption, although longitudinal field data regarding RVFV-specific antibody durability in animals are not available to the best of our knowledge. From a modelling perspective, our objective was to estimate FOI and use it to predict an age-seroprevalence curve consistent with the observed cross-sectional age-seroprevalence patterns. We therefore adopted a parsimonious SIR framework, interpreting loss of seropositivity as a potential explanation for discrepancies between observed and predicted age-seroprevalence rather than explicitly modelling waning immunity. We have now:

      - clarified this rationale, emphasising that there is no direct evidence for waning immunity following natural RVFV infection in cattle, although evidence of seropositivity decay has been suggested in human.

      - highlighted that while an SEIS/SIRS framework could theoretically generate different long-term dynamics, evaluating this approach requires stronger evidence for true immunity loss.

      (6) RVFV induced mortality in serocatalytic model

      We thank the reviewer for this comment and for raising an important conceptual point. However, the force of infection in our study is not estimated using a serocatalytic framework. Instead, FOI is estimated mechanistically within the transmission model as a function of the number of infectious cattle, rather than from age-stratified seroprevalence data.

      RVF-induced mortality is accounted for through its effect on the infectious compartment, where increased mortality reduces the number and duration of infectious cattle and therefore indirectly reduces FOI. Consequently, RVF-related cattle death does not need to be explicitly incorporated into the FOI expression itself. Seroreversion similarly does not influence FOI estimation under this modelling framework. We have clarified this distinction in the Methods section to avoid confusion between mechanistic transmission models and serocatalytic approaches.

      (7) Clarifying previous vs. current study components

      We have revised the Methods and Appendix to make clearer distinctions between our previous work (e.g. household survey data collection, seroprevalence estimates) and the analyses undertaken for this manuscript (e.g. model development and fitting).

      (8) Limitations paragraph

      We have expanded the limitations section to identify the sparse household movement data as contributing most to uncertainty. We have outlined how these limitations may have implications for our conclusions, and may lead to under- or over-estimation of periods of heightened transmission risk.

      (9) Movement ban simulations & suitability of model for vaccination interventions

      We appreciate the reviewer’s concerns regarding the movement ban simulation. On reassessment, we agree that our model structure might not ideally be suited to exploring a movement ban. In this revised manuscript, we have removed this analysis. We are currently developing separate work focused on RVF vaccination strategies in cattle, where this model structure might be more directly applicable, and will reserve a deeper investigation of vaccination interventions for that forthcoming publication.

      Reviewer #1 (Recommendations for the authors):

      We thank the reviewer for the recommendations regarding the Introduction, Methods, Results, and Supplementary Figures. We have addressed these points below and revised the manuscript accordingly.

      (1) Introduction: Should avoid describing as "inaccessible" the regions that are inhabited by nomadic and transhumant pastoralists.

      We have revised the wording to “hard-to-reach” regions.

      (2) Methods: Can the authors state what share of the animals included in the household survey data were cattle as opposed to other small ruminants? It would be helpful to understand what share of the data is "excluded"

      We have now included the total number of cattle sampled, providing clarity on the proportion of data used in the analyses.

      (3) Methods: When introducing the deterministic model, it seems unnecessary to mention the initialization conditions (i.e., introduction of a single infected individual at time 0) when this is later repeated in the Estimation of model parameters section, where it seems simulations were first conducted.

      We have removed the redundant description.

      (4) Results: Could the negative correlation between geographic distance of connected herds and mean seroprevalence simply indicate proximal exposure rather than common risk factors?

      We acknowledge that both mechanisms are plausible. RVFV transmission is strongly influenced by share environmental factors that shape mosquito dynamics; however, direct transmission between proximal cattle herds may also occur through close contact with infectious tissues, bodily fluids, or contaminated materials. We have clarified this interpretation in the Results section.

      (5) Figure S5: inconsistent notation for the scaling factor parameter (tau), which is expressed in equations and tables as psi.

      We thank the reviewer for identifying this issue and have corrected all instances to ensure consistent use of tau throughout the manuscript.

      (6) Figure S6: Why a density plot, isn't the number of temporary extinctions (x-axis) discrete?

      We have replaced the density plot with a bar plot in Figure S6.

    1. Pi: The Minimal Agent Within OpenClaw

      Author: Armin Ronacher | Date: January 31, 2026

      Core Concept

      • Pi is a minimal coding agent that powers OpenClaw, which went viral as ClawdBot/MoltBot
      • Written by Mario Zechner
      • Philosophy: LLMs excel at writing and running code, so embrace this fully

        "LLMs are really good at writing and running code, so embrace this"

      Pi's Distinguishing Features

      • Minimal core design with the shortest system prompt of any known agent

        "it has the shortest system prompt of any agent that I'm aware of"

      • Only four tools: Read, Write, Edit, Bash

      • Extension system with persistent state across sessions

        "it makes up for its tiny core by providing an extension system that also allows extensions to persist state into sessions, which is incredibly powerful"

      • High software quality: no flickering, low memory, reliable

        "Pi itself is written like excellent software. It doesn't flicker, it doesn't consume a lot of memory, it doesn't randomly break"

      Architectural Philosophy: What's Intentionally Excluded

      • No MCP support by design (can use mcporter as workaround)
      • Self-extension over downloading

        "You ask the agent to extend itself. It celebrates the idea of code writing and running code"

      • Users can point agent to existing extensions and remix them rather than using pre-built packages

      Session Architecture

      • Sessions support multiple model providers without leaning into provider-specific features

        "a session can really contain many different messages from many different model providers"

      • Custom messages for extension state persistence

      • Sessions are trees: branching, navigation, and side-quest workflows

        "sessions in Pi are trees. You can branch and navigate within a session"

      • Enables fixing broken tools in branch without wasting main session context

      • Built-in hot reloading for iterative extension development

        "it has built-in hot reloading so that the agent can write code, reload, test it and go in a loop"

      Extension Capabilities

      • Register tools for LLM to call
      • Custom TUI components: spinners, progress bars, file pickers, data tables, preview panes

        "Pi extensions can render custom TUI components directly in the terminal" - Demonstrated running Doom in TUI (proof of flexibility)

      Notable Extensions (Armin's)

      • /answer: Extracts questions from agent's prose into formatted input box; avoids structured question dialogs

        "I don't use plan mode. I encourage the agent to ask questions and there's a productive back and forth"

      • /todos: Agent-accessible to-do list stored as markdown in .pi/todos

      • /review: Agent reviews code before human review; modeled after Codex UI for commits/diffs/PRs

        "As more code is written by agents, it makes little sense to throw unfinished work at humans before an agent has reviewed it first"

      • /control: Multi-agent communication; one Pi sends prompts to another

      • /files: Lists session files with Finder reveal, VS Code diff, quick-look

      Community Extensions

      Skills vs Extensions Philosophy

      • Skills are agent-generated, not downloaded

        "they are hand-crafted by my clanker and not downloaded from anywhere" - Example: Replaced browser automation CLIs/MCPs with CDP skill - Skills are disposable - thrown away when not needed - Example skills: reading shared Pi sessions, commit message crafting, changelog updates, uv preference over pip - Full skills collection

      Key References

      Key Personalities

      • Mario Zechner - Pi author, "very grounded" approach
      • Peter (steipete) - OpenClaw creator, "sci-fi with a touch of madness"
      • Armin Ronacher - article author, Flask creator

      Takeaway

      • Software building software is the future

        "Part of the fascination that working with a minimal agent like Pi gave me is that it makes you live that idea of using software that builds more software" - OpenClaw's growth validates removing UI and connecting agents to chat interfaces

        "given its tremendous growth, I really feel more and more that this is going to become our future"

    1. Following acceptance, authors may pass their manuscript to the journal in any reasonable format (LaTeX or markdown preferred; Word and PDF acceptable).The document will be published in a “web-first” format, such as the Distill version of R Markdown.This allows reflowable text and mobile readability.We currently do not plan to support interactive content, as we do not think the large effort is worth the modest benefit.

      You don't have to host -- why not just evaluate and curte?

      Or you can have a compromise -- a 'traditional summary' in the journal, linking to the interactive version created by the author, the latter being the canonical one

      NB, I think interactive content is high value, but the authors can produce it, especially given Claude code etc

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary: This manuscript reports the identification of putative orthologues of mitochondrial contact site and cristae organizing system (MICOS) proteins in Plasmodium falciparum - an organism that unusually shows an acristate mitochondrion during the asexual part of its life cycle and then this develops cristae as it enters the sexual stage of its life cycle and beyond into the mosquito. The authors identify PfMIC60 and PfMIC19 as putative members and study these in detail. The authors at HA tags to both proteins and look for timing of expression during the parasite life cycle and attempt (unsuccessfully) to localise them within the parasite. They also genetically deleted both gene singly and in parallel and phenotyped the effect on parasite development. They show that both proteins are expressed in gametocytes and not asexuals, suggesting they are present at the same time as cristae development. They also show that the proteins are dispensible for the entire parasite life cycle investigated (asexuals through to sporozoites), however there is some reduction in mosquito transmission. Using EM techniques they show that the morphology of gametocyte mitochondria is abnormal in the knock out lines, although there is great variation.

      Major comments: The manuscript is interesting and is an intriguing use of a well studied organism of medical importance to answer fundamental biological questions. My main comments are that there should be greater detail in areas around methodology and statistical tests used. Also, the mosquito transmission assays (which are notoriously difficult to perform) show substantial variation between replicates and the statistical tests and data presentation are not clear enough to conclude the reduction in transmission that is claimed. Perhaps this could be improved with clearer text?

      We would like to thank the reviewer for taking the time to review our manuscript. We are happy to hear the reviewer thinks the manuscript is interesting and thank the reviewer for their constructive feedback.

      To clarify the statistical analyses used, we included a new supplementary dataset with all statistical analyses and p-values indicated per graph. Furthermore, figure legends now include the information on the exact statistical test used in each case.

      Regarding mosquito experiments, while we indeed reported a reduction in transmission and oocysts numbers we are aware that this effect might be due to the high variability in mosquito feeding assays. To highlight this point, we deleted the sentence "with the transmission reduction of [numbers]...." and we included the sentence "The high variability encountered in the standard membrane feeding assays, though, partially obstructs a clear conclusion on the biological relevance of the observed reduction in oocyst numbers"

      More specific comments to address: Line 101/Fig1E (and figure legend) - What is this heatmap showing. It would be helpful to have a sentence or two linking it to a specific methodology. I could not find details in the M+M section and "specialized, high molecular mass gels" does not adequately explain what experiments were performed. The reference to Supplementary Information 1 also did not provide information.

      We added the information "high molecular mass gels with lower acrylamide percentage" to clarify methodology in the text. Furthermore, we extended the figure legend to include all relevant information. Further experimental details can be found in the study cited in this context, where the dataset originates from (Evers et al., 2021).

      Line 115 and Supplementary Figure 2C + D - The main text says that the transgenic parasites contained a mitochondrially localized mScarlet for visualization and localization, but in the supplementary figure 2 it shows mitotracker labelling rather than mScarlet. This is very confusing. The figure legend also mentions both mScarlet and MitoTracker. I assume that mScarlet was used to view in regular IFAs (Fig S2C) and the MitoTracker was used for the expansion microscopy (Fig S2D)? Please clarify.

      We thank the reviewer for pointing this out - this was indeed incorrectly annotated. We used the endogenous mito-mScarlet signal in IFA and mitoTracker in U-ExM. The figure annotation has now been corrected.

      Figure 2C - what is the statistical test being used (the methods say "Mean oocysts per midgut and statistical significance were calculated using a generalized linear mixed effect model with a random experiment effect under a negative binomial distribution." but what test is this?)?

      The statistic test is now included in the material and method section with the sentence "The fitted model was used to obtain estimated means and contrasts and were evaluated using Wald Statistics". The test is now also mentioned in the figure legend.

      Also the choice of a log10 scale for oocyst intensity is an unusual choice - how are the mosquitoes with 0 oocysts being represented on this graph? It looks like they are being plotted at 10^-1 (which would be 0.1 oocysts in a mosquito which would be impossible).

      As the data spans three orders of magnitude with low values being biologically meaningful, we decided that a log scale would best facilitate readability of the graph. As the 0 values are also important to show, we went with a standard approach to handle 0s in log transformed data and substituted the 0s with a small value (0.001). We apologize for not mentioning this transformation in the manuscript. To make this transformation transparent, we added a break at the lower end of the log‑scaled y‑axis and relabelled the lowest tick as '0'. This ensures that mosquitoes with zero oocysts are shown along the x‑axis without being assigned an artificial value on the log scale. We would furthermore like to highlight that for statistics we used the true value 0 and not 0.001.

      Figure 2D - it is great that the data from all feeding replicates has been shared, however it is difficult to conclude any meaningful impact in transmission with the knock-out lines when there is so much variation and so few mosquitoes dissected for some datapoints (10 mosquitoes are very small sample sizes). For example, Exp1 shows a clear decrease in mic19- transmission, but then Exp2 does not really show as great effect. Similarly, why does the double knock out have better transmission than the single knockouts? Sure there would be a greater effect?

      We agree with the reviewer and with the new sentence added, as per major point, we hope we clarified the concept. Note that original Figure 2D has been moved to the supplementary information, as per minor comment of another reviewer.

      Figure 3 legend - Please add which statistical test was used and the number of replicates.

      Done

      Figure 4 legend - Please add which statistical test was used and the number of replicates.

      Done. Regarding replicates, note that while we measured over 100 cristae from over 30 mitochondria, these all stem from the same parasite culture.

      Figure 5C - the 3D reconstructions are very nice, but what does the red and yellow coloring show?

      Indeed, the information was missing. We added it to the figure legend.

      Line 352 - "Still, it is striking that, despite the pronounced morphological phenotype, and the possibly high mitochondrial stress levels, the parasites appeared mostly unaffected in life cycle propagation, raising questions about the functional relevance of mitochondria at these stages." How do the authors reconcile this statement with the proven fact that mitochondria-targeted antimalarials (such as atovaquone) are very potent inhibitors of parasite mosquito transmission?

      Our original sentence was reductive. What we wanted to state was related to the functional relevance of crista architecture and overall mitochondrial morphology rather than the general functional relevance of the mitochondria. We changed the sentence accordingly.

      Furthermore, even though we do not discuss this in the article, we are aware of mitochondria targeting drugs that are known to block mosquito transmission. We want to point out that it is difficult to discern the disruption of ETC and therefore an impact on energy conversion with the impact on the essential pathway of pyrimidine synthesis, highly relevant in microgamete formation. Still, a recent paper from Sparkes et al. 2024 showed the essentiality of mitochondrial ATP synthesis during gametogenesis so it is very likely that the mitochondrial energy conversion is highly relevant for transmission to the mosquito.

      Reviewer #1 (Significance (Required)):

      This manuscript is a novel approach to studying mitochondrial biology and does open a lot of unanswered questions for further research directions. Currently there are limitations in the use of statistical tests and detail of methodology, but these could be easily be addressed with a bit more analysis/better explanation in the text. This manuscript could be of interest to readers with a general interest in mitochondrial cell biology and those within the specific field of Plasmodium research. My expertise is in Plasmodium cell biology.

      We thank the reviewer for the praise.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Major comments: 1) In my opinion, the authors tend to sensationalize or overinterpret their results. The title of the manuscript is very misleading. While MICOS is certainly important for crista formation, it is not the only factor, as ATP synthase dimer rows make a highly significant contribution to crista morphology. Thus, one can argue with equal validity that ATP synthase should be considered the 'architect', as it's the conformation of the dimers and rows modulate positive curvature. Secondly, while cristae are still formed upon mic60/mic19 gene knockout (KO), they are severely deformed, and likely dysfunctional (see below). Thus, I do not agree with the title that MICOS is dispensable for crista formation, because the authors results show that it clearly is essential. So, the title should be changed.

      We thank the reviewer for taking the time to review our manuscript.

      Based on the reviewers' interpretation we conclude the title does not come across as intended. We have changed the title to: "The role of MICOS in organizing mitochondrial cristae in malaria parasites"

      The Discussion section starting from line 373 also suffers from overinterpretation as well as being repetitive and hard to understand. The authors infer that MICOS stability is compromised less in the single KOs (sKO) in compared to the mic60/mic19 double KO (dKO). MICOS stability was never directly addressed here and the composition of the MICOS complex is unaddressed, so it does not make sense to speculate by such tenuous connections. The data suggest to me that mic60 and mic19 are equally important for crista formation and crista junction (CJ) stabilization, and the dKO has a more severe phenotype than either KO, further demonstrating neither is epistatic.

      We do agree with the reviewer's notion that we did not address complex stability, and our wording did not make this sufficiently clear. We shortened and rephrased the paragraph in question.

      The following paragraphs (line 387 to 422) continues with such unnecessary overinterpretation to the point that it is confusing and contradictory. Line 387 mentions an 'almost complete loss of CJs' and then line 411 mentions an increase in CJ diameter, both upon Mic60 ablation. I do not think this discussion brings any added value to the manuscript and should be shortened. Yes, maybe there are other putative MICOS subunits that may linger in the KOS that are further destabilized in the dKO, or maybe Mic60 remains in the mic19 KO (and vice versa) to somehow salvage more CJs, which is not possible in the dKO. It is impossible to say with confidence how ATP synthase behaves in the KOs with the current data.

      We shortened this paragraph.

      2) While the authors went through impressive lengths to detect any effect on lifecycle progression, none was found except for a reduction in oocyte count. However, the authors did not address any direct effect on mitochondria, such as OXPHOS complex assembly, respiration, membrane potential. This seems like a missed opportunity, given the team's previous and very nice work mapping these complexes by complexome profiling. However, I think there are some experiments the authors can still do to address any mitochondrial defects using what they have and not resorting to complexome profiling (although this would be definitive if it is feasible):

      i) Quantification of MitoTracker Red staining in WT and KOs. The authors used this dye to visualize mitochondria to assay their gross morphology, but unfortunately not to assay membrane potential in the mutants. The authors can compare relative intensities of the different mitochondria types they categorized in Fig. 3A in 20-30 cells to determine if membrane potential is affected when the cristae are deformed in the mutants. One would predict they are affected.

      Interesting suggestion. As our staining and imaging conditions are suitable for such analysis (as demonstrated by Sarazin et al., 2025, https://www.biorxiv.org/content/10.1101/2025.11.27.690934v1), we performed the measurements on the same dataset which we collected for Figure 3. We did, however, not detect any difference in mitotracker intensity between the different lines. The result of this analysis is included in the new version of Supplementary figure S6.

      ii) Sporozoites are shown in Fig S5. The authors can use the same set up to track their motion, with the hypothesis that they will be slower in the mutants compared to WT due to less ATP. This assumes that sporozoite mitochondria are active as in gametocytes.

      While theoretically plausible and informative, we currently do not know the relevance of mitochondrial energy conversion for general sporozoite biology or specifically features of sporozoite movement. Given the required resources and time to set this experiment up and the uncertainty whether it is a relevant proxy for mitochondrial functioning, we argue it is out of scope for this manuscript.

      iii) Shotgun proteomics to compare protein levels in mutants compared to WT, with the hypothesis that OXPHOS complex subunits will be destabilized in the mutants with deformed cristae. This could be indirect evidence that OXPHOS assembly is affected, resulting in destabilized subunits that fail to incorporate into their respective complexes.

      While this experiment could potentially further our understanding of the interaction between MICOS and levels of OXPHOS complex subunits we argue that the indirect nature of the evidence does not justify the required investments.

      To expedite resubmission, the authors can restrict the cell lines to WT and the dKO, as the latter has a stronger phenotype that the individual KOs and conclusions from this cell line are valid for overall conclusions about Plasmodium MICOS.

      I will also conclude that complexome/shotgun proteomics may be a useful tool also for identifying other putative MICOS subunits by determining if proteins sharing the same complexome profile as PfMic60 and Mic19 are affected. This would address the overinterpretation problem of point 1.

      3) I am aware of the authors previous work in which they were not able to detect cristae in ABS, and thus have concluded that these are truly acristate. This can very well be true, or there can be immature cristae forms that evaded detection at the resolution they used in their volumetric EM acquisitions. The mitochondria and gametocyte cristae are pretty small anyway, so it not unreasonable to assume that putative rudimentary cristae in ABS may be even smaller still. Minute levels of sampled complex III and IV plus complex V dimers in ABS that were detected previously by the authors by complexome profiling would argue for the presence of miniscule and/or very few cristae.

      I think that authors should hedge their claim that ABS is acrisate by briefly stating that there still is a possibility that miniscule cristae may have been overlooked previously.

      We acknowledge that we cannot demonstrate the absolute absence of any membrane irregularities along the inner mitochondrial membrane. At the same time, if such structures were present, they would be extremely small and unlikely to contain the full set of proteins characteristic of mature cristae. For this reason, we consider it appropriate to classify ABS mitochondria as acristate. To reflect the reviewer's point while maintaining clarity for readers, we have slightly adjusted our wording in the manuscript, changing 'fully acristate' to 'acristate'.

      This brings me to the claim that Mic19 and Mic60 proteins are not expressed in ABS. This is based on the lack of signal from the epitope tag; a weak signal is detected in gametocytes. Thus, one can counter that Mic19 and Mic60 are also expressed, but below the expression limits of the assay, as the protein exhibits low expression levels when mitochondrial activity is upregulated.

      We agree with the reviewer that the absence of a detectable epitope‑tag signal does not definitively exclude low‑level expression, and we have therefore replaced the term 'absent' with 'undetectable' throughout the manuscript. In context with previous findings of low-level transcripts of the proteins in a study by Lopez-Berragan et al. and Otto et al., we also added the sentence "The apparent absence could indicate that transcripts are not translated in ABS or that the proteins' expression was below detection limits of western blot analysis." to the discussion. _At the same time, we would like to clarify that transcript levels for both genes fall within the

      To address this point, the authors should determine of mature mic60 and mic19 mRNAs are detected in ABS in comparison to the dKO, which will lack either transcript. RT-qPCR using polyT primers can be employed to detect these transcripts. If the level of these mRNAs are equivalent to dKO in WT ABS, the authors can make a pretty strong case for the absence of cristae in ABS.

      We appreciate the reviewer's suggestion. As noted in the Discussion, existing transcriptomic datasets already show detectable MIC19 and MIC60 mRNAs in ABS. For this reason, we expect RT-qPCR to reveal low (but not absent) levels of both transcripts, unlike the true loss expected to be observed in the dKO. Because such residual signals have been reported previously and their biological relevance remains uncertain, we do not believe transcript levels alone can serve as a definitive indicator of cristae absence in ABS.

      They should highlight the twin CX9C motifs that are a hallmark of Mic19 and other proteins that undergo oxidative folding via the MIA pathway. Interestingly, the Mia40 oxidoreductase that is central to MIA in yeast and animals, is absent in apicomplexans (DOI: 10.1080/19420889.2015.1094593).

      Searching for the CX9C motifs is a valuable suggestion. In response to the reviewer´s suggestion we analysed the conservation of the motif in PfMIC19 and included this in a new figure panel (Figure 1 F).

      Did the authors try to align Plasmodium Mic19 orthologs with conventional Mic19s? This may reveal some conserved residues within and outside of the CHCH domain.

      In response to this comment we made Figure 1 F, where we show conserved residues within the CHCH domains of a broad range of MIC19 annotated sequences across the opisthokonts, and show that the Cx9C motifs are conserved also in PfMIC19. Outside the CHCH domain, we did not find any meaningful conservation, as PfMIC19 heavily diverges from opisthokont MIC19.

      5) Statistcal significance. Sometimes my eyes see population differences that are considered insignificant by the statistical methods employed by the authors, eg Fig. 4E, mutants compared to WT, especially the dKO. Have the authors considered using other methods such as student t-test for pairwise comparisons?

      The graphs in figures 3, 4 and 5 got a makeover, such that they now are in linear scale and violin plots (also following a suggestion from further down in the reviewer's comments). We believe that this improves interpretability. ANOVA was kept as statistical testing to assure the correction for multiple comparisons that cannot be performed with standard t-test. A full overview of statistics and exact p-values can also be found in the newly added supplementary information 2.

      Minor comments: Line 33. Anaerobes (eg Giardia) have mitochondria that do produce ATP, unlike aerobic mitochondria

      We acknowledge that producing ATP via OXPHOS is not a characteristic of all mitochondria-like organelles (e.g. mitosomes), which is why these are typically classified separately from canonical mitochondria. When not considering mitochondria-like organelles, energy conversion is the function that the mitochondrion is most well-known for and the one associated with cristae.

      Line 56: Unclear what authors mean by "canonical model of mitochondria"

      To clarify we changed this to "yeast or human" model of mitochondria.

      Lines 75-76: This applies to Mic10 only

      We removed the "high degree of conservation in other cristate eukaryotes" statement.

      Line 80: Cite DOI: 10.1016/j.cub.2020.02.053

      Done

      Fig 2D: I find this table difficult to read. If authors keep table format, at least get rid of 'mean' column' as this data is better depicted in 2C. I suggest depicted this data either like in 3B depicting portion of infected vs unaffected flies in all experiments, then move modified Table to supplement. Important to point out experiment 5 appears to be an outlier with reduced infectivity across all cell lines, including WT.

      To clarify: the mean reported in the table indicates the mean per replicate while the mean reported in figure 2C is the overall mean for a given genotype that corrects for variability within experiments. We agree that moving the table to the supplementary data is a good idea. We decided to not include a graph for infected and non-infected mosquitoes as this information would be partially misleading, highlighting a phenotype we argue to be influenced by the strong variability.

      Fig. 3C-G: I feel like these data repeatedly lead to same conclusions. These are all different ways of showing what is depicted in Fig 2B: mitochondria gross morphology is affected upon ablation of MICOS. I suggest that these graphs be moved to supplement and replaced by the beautiful images.

      Thank you for the nice comment on our images. We have now moved part of the graphs to supplementary figure 6 and only kept the Relative Frequency, Sphericity and total mitochondria volume per cell in the main figure.

      Line 180: Be more specific with which tubulin isoform is used as a male marker and state why this marker was used in supplemental Fig S6.

      We have now specified the exact tubulin isoform used as the male gametocyte marker, both in the main text and in Supplementary Fig. S6. This is a commercial antibody previously known to work as an effective male marker, which is why we selected it for this experiment. This is now clearly stated in the manuscript.

      Line 196 and Fig 3C: the word 'intensities' in this context is very ambiguous. Please choose a different term (puncta, elements, parts?). This is related to major point 2i above.

      To clarify the biological effect that we can conclude form the measurement, we added an explanation about it in the respective section of the results, and we decided to replace the raw results of the plug-in readout with the deduced relative dispersion.

      Line 222: Report male/female crista measurements

      We added Supplementary information 2, which contains exact statistical test and outcomes on all presented quantifications as well as a per-sex statistical analysis of the data from figure 4. Correspondingly, we extended supplementary information 2 by a per-sex colour code for the thin section TEM data.

      Fig. 4B-E: depict data as violin plots or scatter plots like Fig. 2C to get a better grasp of how the crista coverage is distributed. It seems like the data spread is wider in the double KO. This would also solve the problem with the standard deviation extending beyond 0%.

      We changed this accordingly.

      Lines 331-333: Please clarify that this applies for some, but not all MICOS subunits. Please also see major point 1 above. Also, the authors should point out that despite their structural divergence, trypanosomal cryptic mitofilins Mic34 and Mic40 are essential for parasite growth, in contrast to their findings with PfMic60 (DOI: https://doi.org/10.1101/2025.01.31.635831).

      This has been changed accordingly.

      Line 320: incorrect citation. Related to point 1above.

      Correct citation is now included in the text.

      Lines 333-335. This is related to the above. Again, some subunits appear to affect cell growth under lab conditions, and some do not. This and the previous sentence should be rewritten to reflect this.

      This has been changed accordingly.

      Line 343-345: The sentence and citation 45 are strange. Regarding the former, it is about CHCHD10, whose status as a bona fide MICOS subunit is very tenuous, so I would omit this. About the phenomenon observed, I think it makes more sense to write that Mic60 ablation results in partially fragmented mitochondria in yeast (Rabl et al., 2009 J Cell Biol. 185: 1047-63). A fragmented mitochondria is often a physiological response to stress. I would just rewrite as not to imply that mitochondrial fission (or fusion) is impaired in these KOs, or at least this could be one of several possibilities.

      The sentence has been substituted following the indication of the reviewer. Though we still include the data of the human cells as this has also been shown in Stephens et al. 2020.

      Line 373: 'This indicates' is too strong. I would say 'may suggest' as you have no proof that any of the KOs disrupts MICOS. This hypothesis can be tested by other means, but not by penetrance of a phenotype.

      Done

      Line 376-377; 'deplete functionality' does not make sense, especially in the context of talking about MICOS subunit stability. In my opinion, this paragraph overinterprets the KO effects on MICOS stability. None of the experiments address this phenomenon, and thus the authors should not try to interpret their results in this context. See major point 1. Other suggestions for added value

      We removed the sentence. Also, the entire paragraph has been shortened, restructured and wording was changed to address major point 1.

      1) Does Plasmodium Sam50 co-fractionate with Mic60 and Mic19 in BN PAGE (Fig. 1E)

      While we did identify SAMM50 in our BN PAGE, the protein does not co-migrate with the MICOS components but instead comigrates with other components of a putative sorting and assembly machinery (SAM) complex. As SAMM50, the SAM complex and the overarching putative mitochondrial membrane space bridging (MIB) complex are not mentioned in the manuscript, we decided to not include the information in the figure.

      Reviewer #2 (Significance (Required)):

      The manuscript by Tassan-Lugrezin is predicated on the idea that Plasmodium represents the only system in which de novo crista formation can be studied. They leverage this system to ask the question whether MICOS is essential for this process. They conclude based on their data that the answer is no, which the authors consider unprecedented. But even if their claim is true that ABS is acristate, this supposed advantage does not really bring any meaningful insight into how MICOS works in Plasmodium.

      First the positives of this manuscript. As has been the case with this research team, the manuscript is very sophisticated in the experimental approaches that are made. The highlights are the beautiful and often conclusive microscopy performed by the authors. Only the localization of Mic60 and Mic19 was inconclusive due to their very low expression unfortunately.

      The examination of the MICOS mutants during in vitro life cycle of Plasmodium falciparum is extremely impressive and yields convincing results. Mitochondrial deformation is tolerated by life cycle stage differentiation, with a modest but significant reduction of oocyte production, being observed.

      However, despite the herculean efforts of the authors, the manuscript as it currently stands represents only a minor advance in our understanding of the evolution of MICOS, which from the title and focus of the manuscript, is the main goal of the authors. In its current form, the manuscript reports some potentially important findings:

      1) Mic60 is verified to play a role in crista formation, as is predicted by its orthology to other characterized Mic60 orthologs.

      2) The discovery of a novel Mic19 analog (since the authors maintain there is no significant sequence homology), which exhibits a similar (or the same?) complexome profile with Mic60. This protein was upregulated in gametocytes like Mic60 and phenocopies Mic60 KO.

      3) Both of these MICOS subunits are essential (not dispensable) for proper crista formation

      4) Surprisingly, neither MICOS subunit is essential for in vitro growth or differentiation from ABS to sexual stages, and from the latter to sporozoites. This says more about the biology of plasmodium itself than anything about the essentiality of Mic60, ie plasmodium life cycle progression tolerates defects to mitochondrial morphology. But yes, I agree with the authors that Mic60's apparent insignificance for cell growth in examined conditions does differ with its essentiality in other eukaryotes. But fitness costs were not assayed (eg by competition between mutants and WT in infection of mosquitoes)

      5) Decreased fitness of the mutants is implied by a reduction of oocyte formation.

      While interesting in their own way, collectively they do not represent a major advance in our understanding of MICOS evolution. Furthermore, the findings bifurcate into categories informing MICOS or Plasmodium biology. Both aspects are somewhat underdeveloped in their current form.

      This is unfortunate because there seem to be many missed opportunities in the manuscript that could, with additional experiments, lead to a manuscript with much wider impact. For me, what is remarkable about Plasmodium MICOS that sets it apart from other iterations is the apparent absence of the Mic10 subunit. Purification of plasmodium MICOS via the epitope tagged Mic60 and Mic19 could have verified that MICOS is assembled without this core subunit. Perhaps Mic60 and Mic19 are the vestiges of the complex, and thus operate alone in shaping cristae. Such a reduction may also suggest the declining importance of mitochondria in plasmodium.

      Another missed opportunity was to assay the impact of MICOS-depletion of OXPHOS in plasmodium. This is a salient issue as maybe crista morphology is decoupled from OXPHOS capacity in Plasmodium, which links to the apparent tolerance of mitochondrial morphology in cell growth and differentiation. I suggested in section A experiments to address this deficit.

      Finally, the authors could assay fitness costs of MICOS-ablation and associated phenotypes by assaying whether mosquito infectivity is reduced in the mutants when they are directly competing with WT plasmodium. Like the authors, I am also surprised that MICOS mutants can pass population bottlenecks represented by differentiation events. Perhaps the apparent robustness of differentiation may contribute plasmodium's remarkable ability to adapt.

      I realize that the authors put a lot of efforts into their study and again, I am very impressed by the sophistication of the methods employed. Nevertheless, I think there is still better ways to increase the impact of the study aside from overinterpreting the conclusions from the data. But this would require more experiments along the lines I suggest in Section A and here.

      We thank the reviewer for their extensive analysis of the significance of our findings, including the compliments on our microscopy images and the sophisticated experimental approaches. We hope we have convincingly argued why we could or could not include some of the additional analyses suggested by the reviewer in section 1 above.

      With regard to the significance statement, we want to point out that our finding that PfMICOS is not needed for initial formation of cristae (as opposed to organization thereof), is a confirmation of something that has been assumed by the field, without being the actual focus of studies. We argue that the distinction between formation and organization of cristae is important and deserves some attention within the manuscript. The result of MICOS not being involved in the initial formation of cristae, we argue to be relevant in Plasmodium biology and beyond. As for the insights into how MICOS works in Plasmodium we have confirmed that the previously annotated PfMIC60 is indeed involved in the organization of cristae. Furthermore, we have identified and characterized PfMIC19. These findings, we argue, are indeed meaningful insights into PfMICOS.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary:

      MICOS is a conserved mitochondrial protein complex responsible for organising the mitochondrial inner membrane and the maintenance of cristae junctions. This study sheds first light on the role of two MICOS subunits (Mic60 and the newly annotated Mic19) in the malaria parasite Plasmodium falciparum, which forms cristae de novo during sexual development, as demonstrated by EM of thin section and electron tomography. By generating knockout lines (including a double knockout), the authors demonstrate that knockout of both MICOS subunits leads to defects in cristae morphology and a partial loss of cristae junctions. With a formidable set of parasitological assays, the authors show that despite the metabolically important role of mitochondria for gametocytes, the knockout lines can progress through the life stages and form sporozoites, albeit with diminished infection efficiency.

      We thank the reviewer for their time and compliment.

      Major comments:

      1) The authors should improve to present their findings in the right context, in particular by:

      (i) giving a clearer description in the introduction of what is already known about the role of MICOS. This starts in the introduction, where one main finding is missing: loss of MICOS leads to loss of cristae junctions and the detachment of cristae membranes, which are nevertheless formed, but become membrane vesicles. This needs to be clearly stated in the introduction to allow the reader to understand the consistency of the authors' findings in P. falciparum with previous reports in the literature.

      We extended the introduction to include this information.

      (ii) at the end to the introduction, the motivating hypothesis is formulated ad hoc "conclusive evidence about its involvement in the initial formation of cristae is still lacking" (line 83). If there is evidence in the literature that MICOS is strictly required for cristae formation in any organism, then this should be explained, because the bona fide role of MICOS is maintenance of cristae junctions (the hypothesis is still plausible and its testing important).

      To clarify we rephrased the sentence to: "Although MICOS has been described as an organizer of crista junctions, its role during the initial formation of nascent cristae has not been investigated."

      2) Line 96-97: "Interestingly, PfMIC60 is much larger than the human MICOS counterpart, with a large, poorly predicted N-terminal extension." This statement is lacking a reference and presumably refers to annotated ORFs. The authors should clarify if the true N-terminus is definitely known - a 120kDa size is shown for the P. falciparum but this is not compared to the expected length or the size in S. cerevisiae.

      To solve the reference issue, we added the uniprot IDs we compared to see that the annotated ORF is bigger in Plasmodium. We also changed the comparison to yeast instead of human, because we realized it is confusing to compare to yeast all throughout the figure, but then talk about human in this specific sentence.

      Regarding whether the true N-terminus is known. Short answer: No, not exactly.

      However, we do know that the Pf version is about double the size of the yeast protein.

      As the reviewer correctly states, we show the size of 120kDa for the tagged protein in Figure 1G. Considering that we tagged the protein C-terminally, and observed a 120kDa product on western blot, it is safe to conclude that the true N-terminus does not deviate massively from the annotated ORF, and hence, that there is a considerable extension of the protein beyond a 60kDa protein. We do not directly compare to yeast MIC60 on our western blots, however, that comparison can be drawn from literature: Tarasenko et al., 2017 showed that purified MIC60 running at ~60kDa on SDS-PAGE actively bends membranes, suggesting that in its active form, the monomer of yeast MIC60 is indeed 60kDa in size.

      To clarify, we now emphasize that we ran the Alphafold prediction on the annotated open reading frame (annotated and sequenced by Bohme et al. and Chapell et al. now cited in the manuscript), and revised the wording to make clear what we are comparing in which sentence.

      3) lines 244-245: "Furthermore, our data indicates the effect size increases with simultaneous ablation of both proteins?". The authors should explain which data they are referring to, as some of the data in Fig 3 and 4 look similar and all significance tests relate to the wild type, not between the different mutants, so it is not clear if any overserved differences are significant. The authors repeat this claim in the discussion in lines 368-369 without referring to a specific significance test. This needs to be clarified.

      As a reply to this and other comments from the reviewers we added the multiple testing within all samples. In addition, to clarify statistics used we included a supplementary dataset with all p-values and statistical tests used.

      4) lines 304-306: "Though well established as the cristae organizing system, the role of MICOS in initial formation of cristae remains hidden in model organisms that constitutively display cristae.". This sentence is misleading since even in organisms that display numerous cristae throughout their life cycle, new cristae are being formed as the cells proliferate. Thus, failure to produce cristae in MICOS knockout lines would have been observable but has apparently not been reported in the literature. Thus, the concerted process in P. falciparum makes it a great model organism, but not fundamentally different to what has been studied before in other organisms.

      We deleted this statement.

      5) lines 373-378. "where ablation of just MIC60 is sufficient to deplete functionality of the entire MICOS (11, 15),". The authors' claim appears to be contrary to what is actually stated in ref 15, which they cite:

      "MICOS subunits have non-redundant functions as the absence of both MICOS subcomplexes results in more severe morphological and respiratory growth defects than deletion of single MICOS subunits or subcomplexes."

      This seems in line with what the authors show, rather than "different".

      This sentence has been removed.

      6) lines 380-385: "... thus suggesting that membrane invaginations still arise, but are not properly arranged in these knockout lines. This suggests that MICOS either isn't fully depleted,...". These conclusions are incompatible with findings from ref. 15, which the authors cite. In that study, the authors generated a ∆MICOS line which still forms membrane invaginations, showing that MICOS is not required at all for this process in yeast. Hence the authors' implication that MICOS needs to be fully depleted before membrane invaginations cease to occur is not supported by the literature.

      This sentence has been deleted in the revised version of the manuscript.

      Minor comments:

      7) The authors should consider if the first part of their title could be seen as misleading: It suggests that MICOS is "the architect" in cristae formation, but this is not consistent with the literature nor their own findings.

      Title is changed accordingly

      Minor comments:

      • Line 43, of the three seminal papers describing the discovery of MICOS in 2011, the authors only cite two (refs 6 and 7), but miss the third paper, Hoppins et al, PMID: 21987634, which should probably be corrected.

      Done, the paper is now cited

      • Page 2, line 58: for a more complete picture the authors should also cite the work of others here which shows that although at very low levels, e.g. complex III (a drug target) and ATP synthase do assemble (Nina et al, 2011, JBC).

      Done

      • Page 3, line 80: "Irrespective of the shape of an organism's cristae, the crista junctions have been described as tubular channels that connect the cristae membrane to the inner boundary membrane (22, 24)." This omits the slit-shaped cristae junctions found in yeast (Davies et al, 2011, PNAS), which the authors should include.

      The paper and concept have been added to the manuscript, though the sentence has been moved up in the introduction, when crista junctions are first introduced.

      • Line 97: "poorly predicted N-terminal extension", as there is no experimental structure, we don't know if the prediction is poor. Presumably the authors mean either poorly ordered or the absence of secondary structure elements, or the poor confidence score for that region in the prediction? This should be clarified or corrected.

      We were referring to the poor confidence score. To address this comment as well as major point 2, we rewrote the respective paragraph. It now clearly states that confidence of the prediction is low, and we mention the tool that was used to identify conserved domains (Topology-based Evolutionary Domains).

      • Line 98: "an antiparallel array of ten β-sheets". They are actually two parallel beta-sheets stacked together. The authors could find out the name of this fold, but the confidence of the prediction is marked a low/very low. So, its existence is unknown, not just its "function".

      We adapted the domain description to "a stack of two parallel beta-sheets" and replaced the statement on unknown function by the statement "Because this domain is predicted solely from computational analysis, both its actual existence in the native protein and its biological function remain unknown."

      Fig 1B: The authors show two alphafold predictions of S. cerevisiae and P. falciparum Mic60 structures. There is however an experimental Mic60/19 (fragment) structure from the former organism (PMID: 36044574), which should be included if possible

      We appreciate the reviewer's suggestion and note that the available structural data indeed provides valuable insight into how MIC60 and MIC19 interact. However, these structures represent fusion constructs of limited protein fragments and therefore capture only a small portion of each protein, specifically the interaction interface. Because our aim in Fig. 1B is to compare the overall domain architecture of the full‑length proteins, we believe that including fragment‑based structures would be less informative in this context.

      Line: 318-321: "The same trend was observed for PfMIC19 and PfMIC60. Although transcriptomic data suggested that low-level transcripts of PfMIC19 and PfMIC60 are present in ABS (38), we did not detect either of the proteins in ABS by western blot analysis. While this statement is true, the authors should comment on the sensitivity of the respective methods - how well was the antibody working in their hands and how do they interpret the absence of a WB band compared to transcriptomics data?

      The HA antibody used in our experiments is a standard commercial reagent that performs reliably in both WB and IFA, although it shows a low background signal in gametocytes. We agree that the sensitivity of the method and the interpretation of weak or absent bands should be addressed explicitly. Transcript levels for both PfMIC19 and PfMIC60 in asexual blood stages fall within the

      • Lines 322-323: would the authors not typically have expected an IFA signal given the strength of the band in Western blot? If possible, the authors should comment if the negative fluorescence outcome can indeed be explained with the low abundance or if technical challenges are an equally good explanation.

      Considering the nature of the investigated proteins (embedded in the IMM and spread throughout the mitochondria) difficulties in achieving a clear signal in IFA or U-ExM are not very surprizing. While epitopes may remain buried in IFA, U-ExM usually increases accessibility for the antibodies. However, U-ExM comes at the cost of being prone to dotty background signals, therefore potentially hiding low abundance, naturally dotty signals such as the signal of MICOS proteins that localize to distinct foci (at the CJ) along the mitochondrion. Current literature suggests that, in both human and yeast, STED is the preferred method for accurate spatial resolution of MICOS proteins (https://www.ncbi.nlm.nih.gov/pubmed/32567732,https://www.ncbi.nlm.nih.gov/pubmed/32067344). Unfortunately, we do not have experience with, nor access to, this particular technique/method.

      Lines 357-365: the authors describe limitations of the applied methods adequately. Perhaps it would be helpful to make a similar statement about the analysis of 3D objects like mitochondria and cristae from 2D sections. E.g. the apparent cristae length depends on whether cristae are straight (e.g. coiled structures do not display long cross sections despite their true length in 3D).

      The limitations of other methods are described in the respective results section.

      We added a clarifying sentence in the results section of Figure 4:

      "Note that such measurements do not indicate the true total length or width of cristae, as the data is two-dimensional. The recorded values are to be considered indicative of possible trends, rather than absolute dimensions of cristae."

      This statement refers to the length/width measurements of cristae.

      In the context of Figure 4 D we mention the following (see preprint lines 229 - 230): "We expect this effect to translate into the third dimension and thus conclude that the mean crista volume increases with the loss of either PfMIC19,PfMIC60, or both."

      For Figure 5, we included a clarifying statement in the results section of the preprint (lines 269 - 273): "Note that these mitochondrial volumes are not full mitochondria, but large segments thereof. As a result of the incompleteness of the mitochondria within the section, and the tomography specific artefact of the missing wedge, we were unable to confirm whether cristae were in fact fully detached from the boundary membrane, or just too long to fit within the observable z-range. "

      Line 404: perhaps undetected or similar would be a better description than "hidden"?

      The sentence does not exist in the revised manuscript

      Reviewer #3 (Significance (Required)):

      The main strength of the study is that it provides the first characterisation of the MICOS complex in P. falciparum, a human parasite in which the mitochondrion has been shown to be a drug target. Mic60 and the newly annotated Mic19 are confirmed to be essential for proper cristae formation and morphology, as well as overall mitochondrial morphology. Furthermore, the mutant lines are characterised for their ability to complete the parasite life cycle and defects in infection effectivity are observed. This work is an important first step for deciphering the role of MICOS in the malaria parasite and the composition and function of this complex in this organism. The limitation of the study stems from what is already known about MICOS and its subunits in

      great detail in yeast and humans with similar findings regarding loss of cristae and cristae defects. The findings of this study do not provide dramatic new insight on MICOS function or go substantially beyond the vast existing literature in terms of the extent of the study, which focuses on parasitological assays and morphological analysis. Exploring the role of MICOS in an early-divergent organism and human parasite is however important given the divergence found in mitochondrial biology and P. falciparum is a uniquely suited model system. One aspect that would increase the impact of the paper would be if the authors could mechanistically link the observed morphological defects to the decreased infection efficiency, e.g. by probing effects on mitochondrial function. This will likely be challenging as the morphological defects are diverse and the fitness defects appear moderate/mild.

      As suggested by Reviewer 2, we examined mitochondrial membrane potential in gametocytes using MitoTracker staining and did not observe any obvious differences associated with the morphological defects. At present, additional assays to probe mitochondrial function in P. falciparum gametocytes are not sufficiently established, and developing and validating such methods would require substantial work before they could be applied to our mutant lines. For these reasons, a more detailed mechanistic link between the observed morphological changes and the reduced infection efficiency is currently beyond reach.

      The advance presented in this study is to pioneer the study of MICOS in P. falciparum, thus widening our understanding of the role of this complex to different model organism. This study will likely be mainly of interest for specialised audiences such as basic research parasitologists and mitochondrial biologists. My own field of expertise is mitochondrial biology and structural biology.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) I have to admit that it took a few hours of intense work to understand this paper and to even figure out where the authors were coming from. The problem setting, nomenclature, and simulation methods presented in this paper do not conform to the notation common in the field, are often contradictory, and are usually hard to understand. Most importantly, the problem that the paper is trying to solve seems to me to be quite specific to the particular memory study in question, and is very different from the normal setting of model-comparative RSA that I (and I think other readers) may be more familiar with.

      We have revised the paper for clarity at all levels: motivation, application, and parameterization. We clarify that there is a large unmet need for using RSA in a trial-wise manner, and that this approach indeed offers benefits to any team interested in decoding trial-wise representational information linked to a behavioral responses, and as such is not a problem specific to a single memory study.

      (2) The definition of "classical RSA" that the authors are using is very narrow. The group around Niko Kriegeskorte has developed RSA over the last 10 years, addressing many of the perceived limitations of the technique. For example, cross-validated distance measures (Walther et al. 2016; Nili et al. 2014; Diedrichsen et al. 2021) effectively deal with an uneven number of trials per condition and unequal amounts of measurement noise across trials. Different RDM comparators (Diedrichsen et al. 2021) and statistical methods for generalization across stimuli (Schütt et al. 2023) have been developed, addressing shortcomings in sensitivity. Finally, both a Bayesian variant of RSA (Pattern component modelling, (Diedrichsen, Yokoi, and Arbuckle 2018) and an encoding model (Naselaris et al. 2011) can effectively deal with continuous variables or features across time points or trials in a framework that is very related to RSA (Diedrichsen and Kriegeskorte 2017). The author may not consider these newer developments to be classical, but they are in common use and certainly provide the solution to the problems raised in this paper in the setting of model-comparative RSA in which there is more than one repetition per stimulus.

      We appreciate the summary of relevant literature and have included a revised Introduction to address this bounty of relevant work. While much is owed to these authors, new developments from a diverse array of researchers outside of a single group can aid in new research questions, and should always have a place in our research landscape. We owe much to the work of Kriegeskorte’s group, and in fact, Schutt et al., 2023 served as a very relevant touchpoint in the Discussion and helped to highlight specific needs not addressed by the assessment of the “representational geometry” of an entire presented stimulus set. Principal amongst these needs is the application of trial-wise representational information that can be related to trial-wise behavioral responses and thus used to address specific questions on brain-behavior relationships. We invite the Reviewer to consider the utility of this shift with the following revisions to the Introduction.

      Page 3. “Recently, methodological advancements have addressed many known limitations in cRSA. For example, cross-validated distance measures (e.g., Euclidean distance) have improved the reliability of representational dissimilarities in the presence of noise and trial imbalance (Walther et al., 2016; Nili et al., 2014; Diedrichsen et al., 2021). Bayesian approaches such as pattern component modeling (Diedrichsen, Yokoi, & Arbuckle, 2018) have extended representational approaches to accommodate continuous stimulus features or temporal variation. Further, model comparison RSA strategies (Diedrichsen et al., 2021) and generalization techniques across stimuli (Schütt et al., 2023) have improved sensitivity and inference. Nevertheless, a common feature shared across most of improvements is that they require stimuli repetition to examine the representational structure. This requirement limits their ability to probe brain-behavior questions at the level of individual events”.

      Page 8. “While several extensions of RSA have addressed key limitations in noise sensitivity, stimulus variance, and modeling (e.g., Diedrichsen et al., 2021; Schütt et al., 2023), our tRSA approach introduces a new methodological step by estimating representational strength at the trial level. This accounts for the multi-level variance structure in the data, affords generalizability beyond the fixed stimulus set, and allows one to test stimulus- or trial-level modulations of neural representations in a straightforward way”.

      Page 44. “Despite such prevalent appreciation for the neurocognitive relevance of stimulus properties, cRSA often does not account for the fact that the same stimulus (e.g., “basketball”) is seen by multiple subjects and produces statistically dependent data, an issue addressed by Schütt et al., 2023, who developed cross validation and bootstrap methods that explicitly model dependence across both subjects and stimulus conditions”.

      (3) The stated problem of the paper is to estimate "representational strength" in different regions or conditions. With this, the authors define the correlation of the brain RDM with a model RDM. This metric conflates a number of factors, namely the variances of the stimulus-specific patterns, the variance of the noise, the true differences between different dissimilarities, and the match between the assumed model and the data-generating model. It took me a long time to figure out that the authors are trying to solve a quite different problem in a quite different setting from the model-comparative approach to RSA that I would consider "classical" (Diedrichsen et al. 2021; Diedrichsen and Kriegeskorte 2017). In this approach, one is trying to test whether local activity patterns are better explained by representation model A or model B, and to estimate the degree to which the representation can be fully explained. In this framework, it is common practice to measure each stimulus at least 2 times, to be able to estimate the variance of noise patterns and the variance of signal patterns directly. Using this setting, I would define 'representational strength" very differently from the authors. Assume (using LaTeX notation) that the activity patterns $y_j,n$ for stimulus j, measurement n, are composed of a true stimulus-related pattern ($u_j$) and a trial-specific noise pattern ($e_j,n$). As a measure of the strength of representation (or pattern), I would use an unbiased estimate of the variance of the true stimulus-specific patterns across voxels and stimuli ($\sigma^2_{u}$). This estimator can be obtained by correlating patterns of the same stimuli across repeated measures, or equivalently, by averaging the cross-validated Euclidean distances (or with spatial prewhitening, Mahalanobis distances) across all stimulus pairs. In contrast, the current paper addresses a specific problem in a quite specific experimental design in which there is only one repetition per stimulus. This means that the authors have no direct way of distinguishing true stimulus patterns from noise processes. The trick that the authors apply here is to assume that the brain data comes from the assumed model RDM (a somewhat sketchy assumption IMO) and that everything that reduces this correlation must be measurement noise. I can now see why tRSA does make some sense for this particular question in this memory study. However, in the more common model-comparative RSA setting, having only one repetition per stimulus in the experiment would be quite a fatal design flaw. Thus, the paper would do better if the authors could spell the specific problem addressed by their method right in the beginning, rather than trying to set up tRSA as a general alternative to "classical RSA".

      At a general level, our approach rests on the premise that there is meaningful information present in a single presentation of a given stimulus. This assumption may have less utility when the research goals are more focused on estimating the fidelity of signal patterns for RSA, as in designs with multiple repetitions. But it is an exaggeration to state that such a trial-wise approach cannot address the difference between “true” stimulus patterns and noise. This trial-wise approach has explicit utility in relating trial-wise brain information to trial-wise behavior, across multiple cognitions (not only memory studies, as applied here). We have added substantial text to the Introduction distinguishing cRSA, which is widely employed, often in cases with a single repetition per stimulus, and model comparative methods that employ multiple repetitions. We clarify that we do not consider tRSA an alternative to the model comparative approach, and discuss that operational definitions of representational strength are constrained by the study design.

      Page 3. “In this paper, we present an advancement termed trial-level RSA, or tRSA, which addresses these limitations in cRSA (not model comparison approaches) and may be utilized in paradigms with or without repeated stimuli”.

      Page 4. “Representational geometry usually refers to the structure of similarities among repeated presentations of the same stimulus in the neural data (as captured in the brain RSM) and is often estimated utilizing a model comparison approach, whereas representational strength is a derived measure that quantifies how strongly this geometry aligns with a hypothesized model RSM. In other words, geometry characterizes the pattern space itself, while representational strength reflects the degree of correspondence between that space and the theoretical model under test”.

      Finally, we clarified that in our simulation methods we assume a true underlying activity pattern and a random error pattern. The model RSM is computed based on the true pattern, whereas the brain RSM comes from the noisy pattern, not the model RSM itself.

      Page 9. “Then, we generated two sets of noise patterns, which were controlled by parameters σ<sub>A</sub> and σ<sub>B</sub> , respectively, one for each condition”.

      (4) The notation in the paper is often conflicting and should be clarified. The actual true and measured activity patterns should receive a unique notation that is distinct from the variances of these patterns across voxels. I assume that $\sigma_ijk$ is the noise variances (not standard deviation)? Normally, variances are denoted with $\sigma^2$. Also, if these are variances, they cannot come from a normal distribution as indicated on page 10. Finally, multi-level models are usually defined at the level of means (i.e., patterns) rather than at the level of variances (as they seem to be done here).

      We have added notations for true and measured activity patterns to differentiate it from our notation for variance. We agree that multilevel models are usually defined at the level of means rather than at the level of variances and we include a Figure (Fig 1D) that describes the model in terms of the means. We clarify that the σ ($\sigma$) used in the manuscript were not variances/standard deviations themselves; rather, they were meant to denote components of the actual (multilevel) variance parameter. Each component was sampled from normal distributions, and they collectively summed up to comprise the final variance parameter for each trial. We have modified our notation for each component to the lowercase letter s to minimize confusion. We have also made our R code publicly available on our lab github, which should provide more clarity on the exact simulation process.

      (5) In the first set of simulations, the authors sampled both model and brain RSM by drawing each cell (similarity) of the matrix from an independent bivariate normal distribution. As the authors note themselves, this way of producing RSMs violates the constraint that correlation matrices need to be positive semi-definite. Likely more seriously, it also ignores the fact that the different elements of the upper triangular part of a correlation matrix are not independent from each other (Diedrichsen et al. 2021). Therefore, it is not clear that this simulation is close enough to reality to provide any valuable insight and should be removed from the paper, along with the extensive discussion about why this simulation setting is plainly wrong (page 21). This would shorten and clarify the paper.

      We have added justification of the mixed-effects model given the potential assumption violations. We caution readers to investigate the robustness of their models, and to employ permutation testing that does not make independence assumptions. We have also added checks of the model residuals and an example of permutation testing in the Appendix. Finally, we agree that the first simulation setting does not possess several properties of realistic RDMs/RSMs; however, we believe that there is utility in understanding the mathematical properties of correlations – an essential component of RSA – in a straightforward simulation where the ground truth is known, thus moving the simulation to Appendix 1.

      (6) If I understand the second simulation setting correctly, the true pattern for each stimulus was generated as an NxP matrix of i.i.d. standard normal variables. Thus, there is no condition-specific pattern at all, only condition-specific noise/signal variances. It is not clear how the tRSA would be biased if there were a condition-specific pattern (which, in reality, there usually is). Because of the i.i.d. assumption of the true signal, the correlations between all stimulus pairs within conditions are close to zero (and only differ from it by the fact that you are using a finite number of voxels). If you added a condition-specific pattern, the across-condition RSA would lead to much higher "representational strength" estimates than a within-condition RSA, with obvious problems and biases.

      The Reviewer is correct that the voxel values in the true pattern are drawn from i.i.d. standard normal distributions. We take the Reviewer’s suggestion of “condition-specific pattern” to mean that there could be a condition-voxel interaction in two non-mutually exclusive ways. The first is additive, essentially some common underlying multi-voxel pattern like [6, 34, -52, …, 8] for all condition A trials, and different one such pattern for condition B trials, etc. The second is multiplicative, essentially a vector of scaling factors [x1.5, x0.5, x0.8, …, x2.7] for all condition A trials, and a different one such vector for condition B trials, etc. Both possibilities could indeed affect tRSA as much as it would cRSA.

      Importantly, If such a strong condition-specific pattern is expected, one can build a condition-specific model RDM using one-shot coding of conditions (see example figure; src: https://www.newbi4fmri.com/tutorial-9-mvpa-rsa), to either capture this interesting phenomenon or to remove this out as a confounding factor. This practice has been applied in multiple regression cRSA approaches (e.g., Cichy et al., 2013) and can also be applied to tRSA.

      (7) The trial-level brain RDM to model Spearman correlations was analyzed using a mixed effects model. However, given the symmetry of the RDM, the correlations coming from different rows of the matrix are not independent, which is an assumption of the mixed effect model. This does not seem to induce an increase in Type I errors in the conditions studied, but there is no clear justification for this procedure, which needs to be justified.

      We appreciate this important warning, and now caution readers to investigate the robustness of their models, and consider employing permutation testing that does not make independence assumptions. We have also added checks of the model residuals and an example of permutation testing in the supplement.

      Page 46. “While linear mixed-effects modeling offers a powerful framework for analyzing representational similarity data, it is critical that researchers carefully construct and validate their models. The multilevel structure of RSA data introduces potential dependencies across subjects, stimuli, and trials, which can violate assumptions of independence if not properly modeled. In the present study, we used a model that included random intercepts for both subjects and stimuli, which accounts for variance at these levels and improves the generalizability of fixed-effect estimates. Still, there is a potential for systematic dependence across trials within a subject. To ensure that the model assumptions were satisfied, we conducted a series of diagnostic checks on an exemplar ROI (right LOC; middle occipital gyrus) in the Object Perception dataset, including visual inspection of residual distributions and autocorrelation (Appendix 3, Figure 13). These diagnostics supported the assumptions of normality, homoscedasticity, and conditional independence of residuals. In addition, we conducted permutation-based inference, similar to prior improvements to cRSA (Niliet al. 2014), using a nested model comparison to test whether the mean similarity in this ROI was significantly greater than zero. The observed likelihood ratio test statistic fell in the extreme tail of the null distribution (Appendix 3, Figure 14), providing strong nonparametric evidence for the reliability of the observed effect. We emphasize that this type of model checking and permutation testing is not merely confirmatory but can help validate key assumptions in RSA modeling, especially when applying mixed-effects models to neural similarity data. Researchers are encouraged to adopt similar procedures to ensure the robustness and interpretability of their findings”.

      Exemplar Permutation Testing

      To test whether the mean representational strength in the ROI right LOC (middle occipital gyrus) was significantly greater than zero, we used a permutation-based likelihood ratio test implemented via the permlmer function. This test compares two nested linear mixed-effects models fit using the lmer function from the lme4 package, both including random intercepts for Participant and Stimulus ID to account for between-subject and between-item variability.

      The null model excluded a fixed intercept term, effectively constraining the mean similarity to zero after accounting for random effects:

      ROI ~ 0 + (1 | Participant) + (1 | Stimulus)

      The full model included the same random effects structure but allowed the intercept to be freely estimated:

      ROI ~ 1 + (1 | Participant) + (1 | Stimulus)

      By comparing the fit of these two models, we directly tested whether the average similarity in this ROI was significantly different from zero. Permutation testing (1,000 permutations) was used to generate a nonparametric p-value, providing inference without relying on normality assumptions. The full model, which estimated a nonzero mean similarity in the right LOC (middle occipital gyrus), showed a significantly better fit to the data than the null model that fixed the mean at zero (χ²(1) = 17.60, p = 2.72 × 10⁻⁵). The permutation-based p-value obtained from permlmer confirmed this effect as statistically significant (p = 0.0099), indicating that the mean similarity in this ROI was reliably greater than zero. These results support the conclusion that the right LOC contains representational structure consistent with the HMAXc2 RSM. A density plot of the permuted likelihood ratio tests is plotted along with the observed likelihood ratio test in Appendix 3 Figure 14.

      (8) For the empirical data, it is not clear to me to what degree the "representational strength" of cRSA and tRSA is actually comparable. In cRSA, the Spearman correlation assesses whether the distances in the data RSM are ranked in the same order as in the model. For tRSA, the comparison is made for every row of the RSM, which introduces a larger degree of flexibility (possibly explaining the higher correlations in the first simulation). Thus, could the gains presented in Figure 7D not simply arise from the fact that you are testing different questions? A clearer theoretical analysis of the difference between the average row-wise Spearman correlation and the matrix-wise Spearman correlation is urgently needed. The behavior will likely vary with the structure of the true model RDM/RSM.

      We agree that the comparability between mean row-wise Spearman correlations and the matrix-wise Spearman correlation is needed. We believe that the simulations are the best approach for this comparison, since they are much more robust than the empirical dataset and have the advantage of knowing the true pattern/noise levels. We expand on our comparison of mean tRSA values and matrix-wise Spearman correlations on page 42.

      Page 42. “Although tRSA and cRSA both aim to quantify representational strength, they differ in how they operationalize this concept. cRSA summarizes the correspondence between RSMs as a single measure, such as the matrix-wise Spearman correlation. In contrast, tRSA computes such correspondence for each trial, enabling estimates at the level of individual observations. This flexibility allows trial-level variability to be modeled directly, but also introduces subtle differences in what is being measured. Nonetheless, our simulations showed that, although numerical differences occasionally emerged—particularly when comparing between-condition tRSA estimates to within-condition cRSA estimates—the magnitude of divergence was small and did not affect the outcome of downstream statistical tests”.

      (9) For the real data, there are a number of additional sources of bias that need to be considered for the analysis. What if there are not only condition-specific differences in noise variance, but also a condition-specific pattern? Given that the stimuli were measured in 3 different imaging runs, you cannot assume that all measurement noise is i.i.d. - stimuli from the same run will likely have a higher correlation with each other.

      We recognize the potential of condition-specific patterns and chose to constrain the analyses to those most comparable with cRSA. However, depending on their hypotheses, researchers may consider testing condition RSMs and utilizing a model comparison approach or employ the z-scored approach, as employed in the simulations above. Regarding the potential run confounds, this is always the case in RSA and why we exclude within-run comparisons. We have also added to the Discussion the suggestion to include run as a covariate in their mixed-effects models. However, we do not employ this covariate here as we preferred the most parsimonious model to compare with cRSA.

      Page 46 - 47. “Further, while analyses here were largely employed to be comparable with cRSA, researchers should consider taking advantage of the flexibility of the mixed-effects models and include co variates of non-interest (run, trial order etc.)”.

      (10) The discussion should be rewritten in light of the fact that the setting considered here is very different from the model-comparative RSA in which one usually has multiple measurements per stimulus per subject. In this setting, existing approaches such as RSA or PCM do indeed allow for the full modelling of differences in the "representational strength" - i.e., pattern variance across subjects, conditions, and stimuli.

      We agree that studies advancing designs with multiple repetitions of a given stimulus image are useful in estimating the reliability of concept representations. We would argue however that model comparison in RSA is not restricted to such data. Many extant studies do not in fact have multiple repetitions per stimulus per subject (Wang et al., 2018 https://doi.org/10.1088/1741-2552/abecc3, Gao et al, 2022 https://doi.org/10.1093/cercor/bhac058, Li et al, 2022 https://doi.org/10.1002/hbm.26195, Staples & Graves, 2020 https://doi.org/10.1162/nol_a_00018) that allow for that type of model-comparative approach. While beneficial in terms of noise estimation, having multiple presentations was not a requirement for implementing cRSA (Kriegeskorte, 2008 https://doi.org/10.3389/neuro.06.004.2008). The aim of this manuscript is to introduce the tRSA approach to the broad community of researchers whose research questions and datasets could vary vastly, including but not limited to the number of repeated presentations and the balance of trial counts across conditions.

      (11) Cross-validated distances provide a powerful tool to control for differences in measurement noise variances and possible covariances in measurement noise across trials, which has many distinct advantages and is conceptually very different from the approach taken here.

      We have added language on the value of cross-validation approaches to RSA in the Discussion:

      Page 47. “Additionally, we note that while our proposed tRSA framework provides a flexible and statistically principled approach for modeling trial-level representational strength, we acknowledge that there are alternative methods for addressing trial-level variability in RSA. In particular, the use of cross-validated distance metrics (e.g., crossnobis distance) has become increasingly popular for controlling differences in measurement noise variance and accounting for possible covariance structures across trials (Walther et al., 2016). These metrics offer several advantages, including unbiased estimation of representational dissimilarities under Gaussian noise assumptions and improved generalization to unseen data. However, cross-validated distances are conceptually distinct from the approach taken here: whereas cross-validation aims to correct for noise-related biases in representational dissimilarity matrices, our trial-level RSA method focuses on estimating and modeling the variability in representation strength across individual trials using mixed-effects modeling. Rather than proposing a replacement for cross-validated RSA, tRSA adds a complementary tool to the methodological toolkit—one that supports hypothesis-driven inference about condition effects and trial-level covariates, while leveraging the full structure of the data”.

      (12) One of the main limitations of tRSA is the assumption that the model RDM is actually the true brain RDM, which may not be the case. Thus, in theory, there could be a different model RDM, in which representational strength measures would be very different. These differences should be explained more fully, hopefully leading to a more accessible paper.

      Indeed, the chosen model RSM may not be the true RSM, but as the noise level increases the correlation between RSMs practically becomes zero. In our simulations we assume this to be true as a straightforward way to manipulate the correspondence between the brain data and the model. However, just like cRSA, tRSA is constrained by the model selections the researchers employ. We encourage researchers to have carefully considered theoretically-motivated models and, if their research questions require, consider multiple and potentially competing models. Furthermore, the trial-wise estimates produced by tRSA encourage testing competing models within the multiple regression framework. We have added this language to the Discussion.

      Page 46. ..”choose their model RSMs carefully. In our simulations, we designed our model RSM to be the “true” RSM for demonstration purposes. However, researchers should consider if their models and model alternatives”.

      Pages 45-46. “While a number of studies have addressed the validity of measuring representational geometry using designs with multiple repetitions, a conceptual benefit of the tRSA approach is the reliance on a regression framework that engenders the testing of competing conceptual models of stimulus representation (e.g., taxonomic vs. encyclopedic semantic features, as in Davis et al., 2021)”.

      Reviewer #2 (Public review):

      (1)  While I generally welcome the contribution, I take some issue with the accusatory tone of the manuscript in the Introduction. The text there (using words such as 'ignored variances', 'errouneous inferences', 'one must', 'not well-suited', 'misleading') appears aimed at turning cRSA in a 'straw man' with many limitations that other researchers have not recognized but that the new proposed method supposedly resolves. This can be written in a more nuanced, constructive manner without accusing the numerous users of this popular method of ignorance.

      We apologize for the unintended accusatory tone. We have clarified the many robust approaches to RSA and have made our Introduction and Discussion more nuanced throughout (see also 3, 11 and16).

      (2) The described limitations are also not entirely correct, in my view: for example, statistical inference in cRSA is not always done using classic parametric statistics such as t-tests (cf Figure 1): the rsatoolbox paper by Nili et al. (2014) outlines non-parametric alternatives based on permutation tests, bootstrapping and sign tests, which are commonly used in the field. Nor has RSA ever been conducted at the row/column level (here referred to by the authors as 'trial level'; cf King et al., 2018).

      We agree there are numerous methods that go beyond cRSA addressing these limitations and have added discussion of them into our manuscript as well as an example analysis implementing permutation tests on tRSA data (see response to 7). We thank the reviewer for bringing King et al., 2014 and their temporal generalization method to our attention, we added reference to acknowledge their decoding-based temporal generalization approach.

      Page 8. “It is also important to note that some prior work has examined similarly fine-grained representations in time-resolved neuroimaging data, such as the temporal generalization method introduced by King et al. (see King & Dehaene, 2014). Their approach trains classifiers at each time point and tests them across all others, resulting in a temporal generalization matrix that reflects decoding accuracy over time. While such matrices share some structural similarity with RSMs, they do not involve correlating trial-level pattern vectors with model RSMs nor do their second-level models include trial-wise, subject-wise, and item-wise variability simultaneously”.

      (3) One of the advantages of cRSA is its simplicity. Adding linear mixed effects modeling to RSA introduces a host of additional 'analysis parameters' pertaining to the choice of the model setup (random effects, fixed effects, interactions, what error terms to use) - how should future users of tRSA navigate this?

      We appreciate the opportunity to offer more specific proscriptions for those employing a tRSA technique, and have added them to the Discussion:

      Page 46. “While linear mixed-effects modeling offers a powerful framework for analyzing representational similarity data, it is critical that researchers carefully construct and validate their models and choose their model RSMs carefully. In our simulations, we designed our model RSM to be the “true” RSM for demonstration purposes. However, researchers should consider if their models and model alternatives. However, researchers should always consider if their models match the goals of their analysis, including 1) constructing the random effects structure that will converge in their dataset and 2) testing their model fits against alternative structures (Meteyard & Davies, 2020; Park et al., 2020) and 3) considering which effects should be considered random or fixed depending on their research question”.

      (4) Here, only a single real fMRI dataset is used with a quite complicated experimental design for the memory part; it's not clear if there is any benefit of using tRSA on a simpler real dataset. What's the benefit of tRSA in classic RSA datasets (e.g., Kriegeskorte et al., 2008), with fixed stimulus conditions and no behavior?

      To clarify, our empirical approach uses two different tasks: an Object Perception task more akin to the classic RSA datasets employing passive viewing, and a Conceptual Retrieval task that more directly addresses the benefits of the trialwise approach. We felt that our Object Perception dataset is a simpler empirical fMRI dataset without explicit task conditions or a dichotomous behavioral outcome, whereas the Retrieval dataset is more involved (though old/new recognition is the most common form of memory retrieval testing) and  dependent on behavioral outcomes. However, we recognize the utility of replication from other research groups and do invite researchers to utilize tRSA on their datasets.

      (5) The cells of an RDM/RSM reflect pairwise comparisons between response patterns (typically a brain but can be any system; cf Sucholutsky et al., 2023). Because the response patterns are repeatedly compared, the cells of this matrix are not independent of one another. Does this raise issues with the validity of the linear mixed effects model? Does it assume the observations are linearly independent?

      We recognize the potential danger for not meeting model assumptions. Though our simulation results and model checks suggest this is not a fatal flaw in the model design, we caution readers to investigate the robustness of their models, and consider employing permutation testing that does not make independence assumptions. We have also added checks of the model residuals and an example of permutation testing in the Appendix. See response to R1.

      (6) The manuscript assumes the reader is familiar with technical statistical terms such as Type I/II error, sensitivity, specificity, homoscedasticity assumptions, as well as linear mixed models (fixed effects, random effects, etc). I am concerned that this jargon makes the paper difficult to understand for a broad readership or even researchers currently using cRSA that might be interested in trying tRSA.

      We agree this jargon may cause the paper to be difficult to understand. We have expanded/added definitions to these terms throughout the methods and results sections.

      Page 12. “Given data generated with 𝑠<sub>𝑐𝑜𝑛𝑑,𝐴</sub> = 𝑠<sub>𝑐𝑜𝑛𝑑,B</sub>, the correct inference should be a failure to reject the null hypothesis of ; any significant () result in either direction was considered a false positive (spurious effect, or Type I error). Given data generated with , the inference was considered correct if it rejected the null hypothesis of  and yielded the expected sign of the estimated contrast (b<sub>B-𝐴</sub><0). A significant result with the reverse sign of the estimated contrast (b<sub>B-𝐴</sub><0) was considered a Type I error, and a nonsignificant (𝑝 ≥ 0.05) result was considered a false negative (failure to detect a true effect, or Type II error)”.

      Page 2. “Compared to cRSA, the multi-level framework of tRSA was both more theoretically appropriate and significantly sensitive (better able to detect) to true effects”.

      Page 25.”The performance of cRSA and tRSA were quantified with their specificity (better avoids false positives, 1 - Type I error rate) and sensitivity (better avoids false negatives 1 - Type II error rate)”.

      Page 6. “One of the fundamental assumptions of general linear models (step 4 of cRSA; see Figure 1D) is homoscedasticity or homogeneity of variance — that is, all residuals should have equal variance” .

      Page11. “Specifically, a linear mixed-effects model with a fixed effect  of condition (which estimates the average effect across the entire sample, capturing the overall effect of interest) and random effects of both subjects and stimuli (which model variation in responses due to differences between individual subjects and items, allowing generalization beyond the sample) were fitted to tRSA estimates via the `lme4 1.1-35.3` package in R (Bates et al., 2015), and p-values were estimated using Satterthwaites’s method via the `lmerTest 3.1-3` package (Kuznetsova et al., 2017)”.

      (7) I could not find any statement on data availability or code availability. Given that the manuscript reuses prior data and proposes a new method, making data and code/tutorials openly available would greatly enhance the potential impact and utility for the community.

      We thank the reviewer for raising our oversight here. We have added our code and data availability statements.

      Page 9. “Data is available upon request to the corresponding author and our simulations and example tRSA code is available at https://github.com/electricdinolab”.

      Reviewer #1 (Recommendations for the authors):

      (13) Page 4: The limitations of cRSA seem to be based on the assumption that within each different experimental condition, there are different stimuli, which get combined into the condition. The framework of RSA, however, does not dictate whether you calculate a condition x condition RDM or a larger and more complete stimulus x stimulus RDM. Indeed, in practice we often do the latter? Or are you assuming that each stimulus is only shown once overall? It would be useful at this point to spell out these implicit assumptions.

      We agree that stimulus x stimulus RDMs can be constructed and are often used. However, as we mentioned in the Introduction, researchers are often interested in the difference between two (or more) conditions, such as “remembered” vs. “forgotten” (Davis et al., https://doi.org/10.1093/cercor/bhaa269) or “high cognitive load” vs. “low cognitive load” (Beynel et al., https://doi.org/10.1523/JNEUROSCI.0531-20.2020). In those cases, the most common practice with cRSA is to construct condition-specific RDMs, compute cRSA scores separately for each condition, and then compare the scores at the group level. The number of times each stimulus gets presented does not prevent one from creating a model RDM that has the same rows and columns as the brain RDM, either in the same condition (“high load”) or across different conditions.

      (14) Page 5: The difference between condition-level and stimulus-level is not clear. Indeed, this definition seems to be a function of the exact experimental design and is certainly up for interpretation. For example, if I conduct a study looking at the activity patterns for 4 different hand actions, each repeated multiple times, are these actions considered stimuli or conditions?

      We have added clarifying language about what is considered stimuli vs conditions. Indeed, this will depend on the specific research questions being employed and will affect how researchers construct their models. In this specific example, one would most likely consider each different hand action a condition, treating them as fixed effects rather than random effects, given their very limited number and the lack of need to generalize findings to the broader “hand actions” category.

      Page 5. “Critically, the distinction between condition-level and stimulus level is not always clear as researchers may manipulate stimulus-level features themselves. In these cases, what researchers ultimately consider condition-level and stimulus-level will depend on their specific research questions. For example, researchers intending to study generalized object representation may consider object category a stimulus-level feature, while researchers interested in if/how object representation varies by category may consider the same category variable condition-level”.

      (15) Page 5: The fact that different numbers of trials / different levels of measurement noise / noise-covariance of different conditions biases non-cross-validated distances is well known and repeatedly expressed in the literature. We have shown that cross-validation of distances effectively removes such biases - of course, it does not remove the increased estimation variability of these distances (for a formal analysis of estimation noise on condition patterns and variance of the cross-nobis estimator, see (Diedrichsen et al. 2021)).

      We thank the reviewer for drawing our attention to this literature and have added discussions of these methods.

      (16). Page 5: "Most studies present subjects with a fixed set of stimuli, which are supposedly samples representative of some broader category". This may be the case for a certain type of RSA experiments in the visual domain, but it would be unfair to say that this is a feature of RSA studies in general. In most studies I have been involved in, we use a "stimulus" x "stimulus" RDM.

      We have edited this sentence to avoid the “most” characterization. We also added substantial text to the introduction and discussion distinguishing cRSA, which is nonetheless widely employed, especially in cases with a single repetition per stimulus (Macklin et al., 2023, Liu et al, 2024) and the model comparative method and explicitly stating that we do not consider tRSA an alternative to the model comparative approach.

      (17). Page 5: I agree that "stimuli" should ideally be considered a random effect if "stimuli" can be thought of as sampled from a larger population and one wants to make inferences about that larger population. Sometimes stimuli/conditions are more appropriately considered a fixed effect (for example, when studying the response to stimulation of the 5 fingers of the right hand). Techniques to consider stimuli/conditions as a random effect have been published by the group of Niko Kriegeskorte (Schütt et al. 2023).

      Indeed, in some cases what may be thought of as “stimuli” would be more appropriately entered into the model as a fixed effect; such questions are increasingly relevant given the focus on item-wise stimulus properties (Bainbridge et al., Westfall & Yarkoni). We have added text on this issue to the Discussion and caution researchers to employ models that most directly answer their research questions.

      Page 46. “However, researchers should always consider if their models match the goals of their analysis, including 1) constructing the random effects structure that will converge in their dataset and 2) testing their model fits against alternative structures (Meteyard & Davies, 2020; Park et al., 2020) and 3) considering which effects should be considered random or fixed depending on their research question. An effect is fixed when the levels represent the specific conditions of theoretical interest (e.g., task condition) and the goal is to estimate and interpret those differences directly. In contrast, an effect is random when the levels are sampled from a broader population (e.g., subjects) and the goal is to account for their variability while generalizing beyond the sample tested. Note that the same variable (e.g., stimuli) may be considered fixed or random depending on the research questions”.

      (18) Page 6: It is correct that the "classical" RSA depends on a categorical assignment of different trials to different stimuli/conditions, such that a stimulus x stimulus RDM can be computed. However, both Pattern Component Modelling (PCM) and Encoding models are ideally set up to deal with variables that vary continuously on a trial-by-trial or moment-by-moment basis. tRSA should be compared to these approaches, or - as it should be clarified - that the problem setting is actually quite a different one.

      We agree that PCM and encoding models offer a flexible approach and handle continuous trial-by-trial variables. We have clarified the problem setting in cRSA is distinct on page 6, and we have added the robustness of encoding models and their limitations to the Discussion.

      Page 6. “While other approaches such as Pattern Component Modeling (PCM) (Diedrichsen et al., 2018) and encoding models (Naselaris et al., 2011) are well-suited to analyzing variables that vary continuously on a trial-by-trial or moment-by-moment basis, these frameworks address different inferential goals. Specifically, PCM and encoding models focus on estimating variance components or predicting activation from features, while cRSA is designed to evaluate representational geometry. Thus, cRSA as well as our proposed approach address a problem setting distinct from PCM and encoding models”.

      (19) Page 8: "Then, we generated two noise patterns, which were controlled by parameters 𝜎 𝐴 and 𝜎𝐵, respectively, one for each condition." This makes little sense to me. The noise patterns should be unique to each trial - you should generate n_a + n_b noise patterns, no?

      We clarify that the “noise patterns” here are n_voxel x n_trial in size; in other words, all trial-level noise patterns are generated together and each trial has their own unique noise pattern. We have revised our description as “two sets of noise patterns” for clarity starting on page 9.

      (20) Page 9: First, I assume if this is supposed to be a hierarchical level model, the "noise parameters" here correspond to variances? Or do these \sigma values mean to signify standard deviations? The latter would make little sense. Or is it the noise pattern itself?

      As clarified in 4., the σ values are meant to denote hierarchical components of the composite standard deviation; we have updated our notation to use lower case letter s instead for clarity.

      (21) Page 10: your formula states "𝜎<sub>𝑠𝑢𝑏𝑗</sub>~ 𝙽(0, 0.5^2)". This conflicts with your previous mention that \sigmas are noise "levels" are they the noise patterns themselves now? Variances cannot be normally distributed, as they cannot be negative.

      As clarified in 4., the σ values are meant to denote hierarchical components of the composite standard deviation; we have updated our notation to use lower case letter s instead for clarity.

      (22) Page 13: What was the task of the subject in the Memory retrieval task? Old/new judgements relative to encoding of object perception?

      We apologize for the lack of clarity about the Memory Retrieval task and have added that information and clarified that the old/new judgements were relative to a separate encoding phase, the brain data for which has been reported elsewhere.

      Page 14. “Memory Retrieval took place one day after Memory Encoding and involved testing participants’ memory of the objects seen in the Encoding phase. Neural data during the Encoding phase has been reported elsewhere. In the main Memory Retrieval task, participants were presented with 144 labels of real-world objects, of which 114 were labels for previously seen objects and 30 were unrelated novel distractors. Participants performed old/new judgements, as well as their confidence in those judgements on a four-point scale (1 = Definitely New, 2 = Probably New, 3 = Probably Old, 4 = Definitely Old)”.

      (23) Page 13: If "Memory Retrieval consisted of three scanning runs", then some of the stimulus x stimulus correlations for the RSM must have been calculated within a run and some between runs, correct? Given that all within-run estimates share a common baseline, they share some dependence. Was there a systematic difference between the within-run and the between-run correlations?

      We have clarified in this portion of the methods that within run comparisons were excluded from our analyses. We also double-checked that the within-run exclusion was included in the description of the Neural RSMs.

      Page 14. “Retrieval consisted of three scanning runs, each with 38 trials, lasting approximately 9 minutes and 12 seconds (within-run comparisons were later excluded from RSA analyses)”.

      Page 18. “This was done by vectorizing the voxel-level activation values within each region and calculating their correlations using Pearson’s r, excluding all within-run comparisons.”

      (24) Page 20: It is not clear why the mean estimate of "representational strength" (i.e., model-brain RSM correlations) is important at all. This comes back to Major point #2, namely that you are trying to solve a very different problem from model-comparative RSA.

      We have clarified that our approach is not an alternative to model-comparative RSA, and that depending on the task constraints researchers may choose to compare models with tRSA or other approaches requiring stimulus repetition (see 3).

      (25) Page 21: I believe the problems of simulating correlation matrices directly in the way that the authors in their first simulation did should be well known and should be moved to an appendix at best. Better yet, the authors could start with the correct simulation right away.

      We agree the paper is more concise with these simulations being moved to the appendix and more briefly discussed. We have implemented these changes (Appendix 1). However, we are not certain that this problem is unknown, and have several anecdotes of researchers inquiring about this “alternative” approach in talks with colleagues, thus we do still discuss the issues with this method.

      (26) Page 26: Is the "underlying continuous noise variable 𝜎𝑡𝑟𝑖𝑎𝑙 that was measured by 𝑣𝑚𝑒𝑎𝑠𝑢𝑟𝑒𝑑 " the variance of the noise pattern or the noise pattern itself? What does it mean it was "measured" - how?

      𝜎𝑡𝑟𝑖𝑎𝑙 is a vector of standard deviations for different trials, and 𝜎𝑡𝑟𝑖𝑎𝑙 i would be used to generate the noise patterns for trial i. v_measured is a hypothetical measurement of trial-level variability, such as “memorability” or “heartbeat variability”. We have revised our description to clarify our methods.

      Reviewer #2 (Recommendations for the authors):

      (8) It would be helpful to provide more clarity earlier on in the manuscript on what is a 'trial': in my experience, a row or column of the RDM is usually referred to as 'stimulus condition', which is typically estimated on multiple trials (instances or repeats) of that stimulus condition (or exemplars from that stimulus class) being presented to the subject. Here, a 'trial' is both one measurement (i.e., single, individual presentation of a stimulus) and also an entry in the RDM, but is this the most typical scenario for cRSA? There is a section in the Discussion that discusses repetitions, but I would welcome more clarity on this from the get-go.

      We have added discussion of stimulus repetition methods and datasets to the Introduction and clarified our use of the terms.

      Page 8. “Critically, in single-presentation designs, a “trial” refers to one stimulus presentation, and corresponds to a row or column in the RSM. In studies with repeated stimuli, these rows are often called “conditions” and may reflect aggregated patterns across trials. tRSA is compatible with both cases: whether rows represent individual trials or averaged trials that create “conditions”, tRSA estimates are computed at the row level”.

      (9) The quality of the results figures can be improved. For example, axes labels are hard to read in Figure 3A/B, panels 3C/D are hard to read in general. In Figure 7E, it's not possible to identify the 'dark red' brain regions in addition to the light red ones.

      We thank the reviewer for raising these and have edited the figures to be more readable in the manner suggested.

      (10) I would be interested to see a comparison between tRSA and cRSA in other fMRI (or other modality) datasets that have been extensively reported in the literature. These could be the original Kriegeskorte 96 stimulus monkey/fMRI datasets, commonly used open datasets in visual perception (e.g., THINGS, NSD), or the above-mentioned King et al. dataset, which has been analyzed in various papers.

      We recognize the great utility of replication from other research groups and do invite researchers to utilize tRSA on their datasets.

      (11) On P39, the authors suggest 'researchers can confidently replace their existing cRSA analysis with tRSA': Please discuss/comment on how researchers should navigate the choice of modeling parameters in tRSA's linear mixed effects setting.

      We have added discussion of the mixed-effects parameters and the various and encourage researchers to follow best practices for their model selection.

      Page 46. “However, researchers should always consider if their models match the goals of their analysis, including 1) constructing the random effects structure that will converge in their dataset and 2) testing their model fits against alternative structures (Meteyard & Davies, 2020; Park et al., 2020) and 3) considering which effects should be considered random or fixed depending on their research question”.

      (12) The final part of the Results section, demonstrating the tRSA results for the continuous memorability factor in the real fMRI data, could benefit from some substantiation/elaboration. It wasn't clear to me, for example, to what extent the observed significant association between representational strength and item memorability in this dataset is to be 'believed'; the Discussion section (p38). Was there any evidence in the original paper for this association? Or do we just assume this is likely true in the brain, based on prior literature by e.g. Bainbridge et al (who probably did not use tRSA but rather classic methods)?

      Indeed, memorability effects have been replicated in the literature, but not using the tRSA method. We have expanded our discussion to clarify the relationship of our findings and the relevant literature and methods it has employed.

      Page 38. “Critically, memorability is a robust stimulus property that is consistent across participants and paradigms (Bainbridge, 2022). Moreover, object memorability effects have been replicated using a variety of methods aside from tRSA, including univariate analyses and representational analyses of neural activity patterns where trial-level neural activity pattern estimates are correlated directly with object memorability (Slayton et al, 2025).”

      (13) The abstract could benefit from more nuance; I'm not sure if RSA can indeed be said to be 'the principal method', and whether it's about assessing 'quality' of representations (more commonly, the term 'geometry' or 'structure' is used).

      We have edited the abstract to reflect the true nuisance in the current approaches.

      Abstract. Neural representation refers to the brain activity that stands in for one’s cognitive experience, and in cognitive neuroscience, a prominent method of studying neural representations is representational similarity analysis (RSA). While there are several recent advances in RSA, the classic RSA (cRSA) approach examines the structure of representations across numerous items by assessing the correspondence between two representational similarity matrices (RSMs): usually one based on a theoretical model of stimulus similarity and the other based on similarity in measured neural data.

      (14) RSA is also not necessarily about models vs. neural data; it can also be between two neural systems (e.g., monkey vs. human as in Kriegeskorte et al., 2008) or model systems (see Sucholutsky et al., 2023). This statement is also repeated in the Introduction paragraph 1 (later on, it is correctly stated that comparing brain vs. model is most likely the 'most common' approach).

      We have added these examples in our introduction to RSA.

      Page 3.”One of the central approaches for evaluating information represented in the brain is representational similarity analysis (RSA), an analytical approach that queries the representational geometry of the brain in terms of its alignment with the representational geometry of some cognitive model (Kriegeskorte et al., 2008; Kriegeskorte & Kievit, 2013), or, in some cases, compares the representational geometry of two neural systems (e.g., Kriegeskorte et al., 2008) or two model systems (Sucholutsky et al., 2023)”.

      (15) 'theoretically appropriate' is an ambiguous statement, appropriate for what theory?

      We apologize for the ambiguous wording, and have corrected the text:

      Page 11. “Critically, tRSA estimates were submitted to a mixed-effects model which is statistically appropriate for modeling the hierarchical structure of the data, where observations are nested within both subjects and stimuli (Baayen et al., 2008; Chen et al., 2021)”.

      (16) I found the statement that cRSA "cannot model representation at the level of individual trials" confusing, as it made me think, what prohibits one from creating an RDM based on single-trial responses? Later on, I understood that what the authors are trying to say here (I think) is that cRSA cannot weigh the contributions of individual rows/columns to the overall representational strength differently.

      We thank the reviewer for their clarifying language and have added it to this section of the manuscript.

      “Abstract. However, because cRSA cannot weigh the contributions of individual trials (RSM rows/columns), it is fundamentally limited in its ability to assess subject-, stimulus-, and trial-level variances that all influence representation”.

      (17) Why use "RSM" instead of "RDM"? If the pairwise comparison metric is distance-based (e..g, 1-correlation as described by the authors), RDM is more appropriate.

      We apologize for the error, and have clarified the Methods text:

      Page3-4. First, brain activity responses to a series of N trials are compared against each other (typically using Pearson’s r) to form an N×N representational similarity matrix.

      (18) Figure 2: please write 'Correlation estimate' in the y-axis label rather than 'Estimate'.

      We have edited the label in Figure 2.

      (19) Page 6 'leaving uncertain the directionality of any findings' - I do not follow this argument. Obviously one can generate an RDM or RSM from vector v or vector -v. How does that invalidate drawing conclusions where one e.g., partials out the (dis)similarity in e.g., pleasantness ratings out of another RDM/RSM of interest?

      We agree such an approach does not invalidate the partial method; we have clarified what we mean by “directionality”.

      Page 8. ”For instance, even though a univariate random variable , such as pleasantness ratings, can be conveniently converted to an RSM using pairwise distance metrics (Weaverdyck et al., 2020), the very same RSM would also be derived from the opposite random variable , leaving uncertain of the directionality (or if representation is strongest for pleasant or unpleasant items) of any findings with the RSM (see also Bainbridge & Rissman, 2018)”.

      (20) P7 'sampled 19900 pairs of values from a bi-variate normal distribution', but the rows/columns in an RDM are not independent samples - shouldn't this be included in the simulation? I.e., shouldn't you simulate first the n=200 vectors, and then draw samples from those, as in the next analysis?

      This section has been moved to Appendix 1 (see responses to Reviewer 1.13).

      (21) Under data acquisition, please state explicitly that the paper is re-using data from prior experiments, rather than collecting data anew for validating tRSA.

      We have clarified this in the data acquisition section.

      Page 13. “A pre-existing dataset was analyzed to evaluate tRSA. Main study findings have been reported elsewhere (S. Huang, Bogdan, et al., 2024)”.

      (22) Figure 4 could benefit from some more explanation in-text. It wasn't clear to me, for example, how to interpret the asterisks depicted in the right part of the figure.

      We clarified the meaning of the asterisks in the main text in addition to the existent text in the figure caption.

      Page 26. “see Figure 4, off-diagonal cells in blue; asterisks indicate where tRSA was statistically more sensitive then cRSA)”.

      (23) Page 38 "the outcome of tRSA's improved characterization can be seen in multiple empirical outcomes:" it seems there is one mention of 'outcomes' too many here.

      We have revised this sentence.

      Page 41. “tRSA's improved characterization can be seen in multiple empirical outcomes”.

      (24) Page 38 "model fits became the strongest" it's not clear what aspect of the reported results in the paragraph before this is referring to - the Appendix?

      Yes, the model fits are in the Appendix, we have added this in text citation.

      Moreover, model-fits became the strongest when the models also incorporated trial-level variables such as fMRI run and reaction time (Appendix 3, Table 6).

      References

      Diedrichsen, J., Berlot, E., Mur, M., Schütt, H. H., Shahbazi, M., & Kriegeskorte, N. (2021). Comparing representational geometries using whitened unbiased-distance-matrix similarity. Neurons, Behavior, Data and Theory, 5(3). https://arxiv.org/abs/2007.02789

      Diedrichsen, J., & Kriegeskorte, N. (2017). Representational models: A common framework for understanding encoding, pattern-component, and representational-similarity analysis. PLoS Computational Biology, 13(4), e1005508.

      Diedrichsen, J., Yokoi, A., & Arbuckle, S. A. (2018). Pattern component modeling: A flexible approach for understanding the representational structure of brain activity patterns. NeuroImage, 180, 119-133.

      Naselaris, T., Kay, K. N., Nishimoto, S., & Gallant, J. L. (2011). Encoding and decoding in fMRI. NeuroImage, 56(2), 400-410.

      Nili, H., Wingfield, C., Walther, A., Su, L., Marslen-Wilson, W., & Kriegeskorte, N. (2014). A toolbox for representational similarity analysis. PLoS Computational Biology, 10(4), e1003553.

      Schütt, H. H., Kipnis, A. D., Diedrichsen, J., & Kriegeskorte, N. (2023). Statistical inference on representational geometries. ELife, 12. https://doi.org/10.7554/eLife.82566

      Walther, A., Nili, H., Ejaz, N., Alink, A., Kriegeskorte, N., & Diedrichsen, J. (2016). Reliability of dissimilarity measures for multi-voxel pattern analysis. NeuroImage, 137, 188-200.

      King, M. L., Groen, I. I., Steel, A., Kravitz, D. J., & Baker, C. I. (2019). Similarity judgments and cortical visual responses reflect different properties of object and scene categories in naturalistic images. NeuroImage, 197, 368-382.

      Kriegeskorte, N., Mur, M., Ruff, D. A., Kiani, R., Bodurka, J., Esteky, H., ... & Bandettini, P. A. (2008). Matching categorical object representations in inferior temporal cortex of man and monkey. Neuron, 60(6), 1126-1141.

      Nili, H., Wingfield, C., Walther, A., Su, L., Marslen-Wilson, W., & Kriegeskorte, N. (2014). A toolbox for representational similarity analysis. PLoS computational biology, 10(4), e1003553.

      Sucholutsky, I., Muttenthaler, L., Weller, A., Peng, A., Bobu, A., Kim, B., ... & Griffiths, T. L. (2023). Getting aligned on representational alignment. arXiv preprint arXiv:2310.13018.

    2. Reviewer #2 (Public review):

      This paper proposes two changes to classic RSA, a popular method to probe neural representation in neuroimaging experiments: computing RSA at row/column level of RDM, and using linear mixed modeling to compute second level statistics, using the individual row/columns to estimate a random effect of stimulus. The benefit of the new method is demonstrated using simulations and a re-analysis of a prior fMRI dataset on object perception and memory encoding.

      The author's claim that tRSA is a promising approach to perform more complete modeling of cogneuro data, and to conceptualize representation at the single trial/event level (cf Discussion section on P42), is appealing.

      In their revised manuscript, the authors have addressed some previous concerns, now referencing more literature aiming to improve RSA and its associated statistical inferences, and providing more guidance on methodological considerations in the Discussion. However, I wish the authors had more extensively edited the Introduction to better contextualize the work and clarify the specific settings in which they see the method as being beneficial over classic RSA. For example, some of the limitations of cRSA mentioned on page 6, e.g. related to presenting the same stimuli to multiple subjects, seem to be quite specific to settings where the researcher expects differential responses across subjects to fundamentally alter the interpretation, rather than something that will just average out by repeatedly offering the same stimulus, or combining data across subjects. It's not clear to me how the switch from 'matrix-level' to 'row-level' analysis in tRSA necessarily addresses this problem. I would be very helpful if the authors would more explicitly outline what problem the row-level aspect of tRSA is solving; what problem statistical inference via LMM is solving; and walk the reader through a very specific use case (perhaps a toy version of the real-data experiment which is now at the end of the paper). Explaining the utility of tRSA for experimental settings in which assessing representational strength for a single-events is crucial would clarify the contribution of this new method better.

      A few weaknesses mentioned in my previous review were not adequately addressed. To demonstrate the utility of the method on real neural recordings, only a single dataset is used with a quite complicated experimental design; it's not clear if there is any benefit of using tRSA on a simpler real dataset. Moreover, the cells of an RDM/RSM reflect pairwise comparisons between response patterns. Because the response patterns are repeatedly compared, the cells of this matrix are not independent of one another. While the authors show examples that failure to meet independence assumptions do not affect results in their specific dataset, it does not get acknowledged as a problem at a more fundamental level. Finally, while the paper now states that 'simulations and example tRSA code' are publicly available, the link points to the lab's general github page containing many lab repositories, in which I could not identify a specific repository related to this paper. This is disappointing given that the main goal of this manuscript is to provide a new method that they encourage others to use; a clear pointer to available code is only a minimal requirement to achieve that goal. A dedicated repository, including documentation, READMEs and tutorials/demo's to run simulations, compare methods, etc. would greatly enhance the paper's contribution.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      This study builds upon previous work in schizophrenia and other disorders using fibroblasts derived from patients, assessing mitochondrial phenotypes and then using these to identify compounds which reverse these phenotypes. The study is one of the largest of its kind performed to date with 168 patients included. The authors undertake mitochondrial phenotyping and machine learning of the outputted images to be segregate the patients based on clinical features and the associated cellular phenotype. The authors then go on to screening virtually publicly available datasets of cancer cells treated with compounds and also genetic modulations. In doing so, they can identify compounds which modulate the phenotypes and therefore might be of value to test in the patient derived lines. The study has strengths in the large number of samples, the advanced machine learning and the virtual screening. Furthermore, the authors highlight and discuss the limitations of the study well. There are some weaknesses which the authors can address. Firstly in the introduction, although it is comprehensive in some areas, in other areas for example outlining the fibroblast mitochondrial phenotype and indeed the use of patient fibroblasts to identify compounds, there is significant literature missing, particularly in Parkinson's Disease where screening in fibroblasts has resulted in compounds entering Phase 3 clinical trials. In addition to the studies using 100 or more PD patient fibroblast lines for phenotyping and patient stratification have not been included. It would be useful if the authors could comment on the robustness of the phenotypes identified in the fibroblasts over multiple passages. This is important when considering the biological and disease relevance of the phenotypes and it is not something the authors show or comment on. In discussing the genetic manipulations it would be useful to comment on the genes identified in more detail particularly those which are not known to be associated with changes in mitochondrial phenotypes.

      Significance

      This study builds on work from multiple labs investigating the utility of fibroblasts to identify phenotypes and find potential novel therapeutics. The size of the cohort and the advanced machine learning methods are a particular strength and this advances the field in this area. The availability of the data and code is a strength to allow others to replicate the findings. The lack of experimental validation of any of the compounds or genes identified by the virtual screening is a weakness which could be addressed.

    1. Partenariat Parents-École : Un Pilier pour la Réussite Scolaire

      Résumé Analytique

      Ce document de synthèse analyse les points clés de la conférence organisée par Parents Partenaires en Éducation (PPE) Ontario, portant sur l'importance cruciale du partenariat entre les familles et les institutions scolaires.

      Le message central est que la réussite des élèves ne repose pas uniquement sur l'école, mais sur une collaboration étroite et proactive où les parents agissent en tant que « co-éducateurs ».

      L'engagement parental est structuré autour de trois dimensions : l'investissement personnel, l'investissement cognitif et l'engagement institutionnel.

      Pour les familles, particulièrement celles issues de l'immigration, cette implication est un levier majeur pour déconstruire les biais inconscients, valoriser l'identité culturelle et assurer une intégration réussie.

      L'analyse démontre que l'inclusion est un choix délibéré et que le sentiment d'appartenance ne peut émerger que lorsque les voix des parents participent activement aux processus de décision au sein des conseils d'école et des comités.

      --------------------------------------------------------------------------------

      1. Cadre Conceptuel de l'Engagement Parental

      L'engagement parental ne se limite pas à la supervision des devoirs ; il s'agit d'un investissement multidimensionnel qui influence directement les performances académiques et le bien-être socio-affectif de l'enfant.

      Les Trois Dimensions de l'Engagement

      Selon la littérature scientifique citée, l'engagement se décline comme suit :

      | Dimension | Description | Exemples concrets | | --- | --- | --- | | Investissement personnel | Aspirations et intérêt manifesté pour la vie scolaire de l'enfant. | Discussions sur la journée, intérêt pour les camarades et les activités. | | Investissement cognitif | Accompagnement dans les tâches et respect des structures scolaires. | Supervision des devoirs, fréquentation de la bibliothèque, respect des règles (ex: usage des appareils électroniques). | | Engagement institutionnel | Présence effective et participation aux processus de décision. | Participation aux conseils d'école, comités de parents, réunions et bénévolat actif. |

      --------------------------------------------------------------------------------

      2. L'Identité et les Valeurs : Fondements du Partenariat

      L'identité et les valeurs des parents ne doivent pas rester à la porte de l'école. Elles constituent les filtres à travers lesquels le partenariat s'exprime.

      L'identité comme outil de décodage : Le système scolaire a besoin de connaître l'identité socioculturelle des familles pour adapter son offre de services (enseignants, travailleurs sociaux).

      La décolonisation de l'esprit : Pour les parents immigrants, il est essentiel d'articuler leur identité face au choc culturel et de valoriser leurs origines pour que l'enfant se sente en sécurité dans son environnement scolaire.

      Le filtre des valeurs : Les décisions majeures concernant l'éducation de l'enfant doivent être passées au filtre des valeurs familiales. L'implication dans les conseils d'école permet de challenger l'approche « taille unique » (one size fits all) des politiques scolaires.

      --------------------------------------------------------------------------------

      3. Analyse des Bénéfices de la Collaboration

      La collaboration entre les parents et l'école crée une dynamique « gagnant-gagnant » pour toutes les parties prenantes.

      Pour l'Élève

      Renforcement de la confiance : L'enfant est fier de voir sa famille impliquée et valorisée.

      Motivation accrue : La proximité des parents stimule l'engagement de l'élève dans ses propres apprentissages.

      Réduction des biais : Une collaboration étroite permet de changer le regard du personnel scolaire sur l'enfant, transformant parfois une perception négative (ex: hyperactivité perçue comme un trouble) en une reconnaissance de traits positifs (ex: curiosité et créativité).

      Pour les Parents

      Fluidité de la communication : Les échanges directs avec les enseignants facilitent la résolution rapide des problématiques.

      Acteur du changement : Les parents peuvent influencer les politiques (ex: code vestimentaire, introduction de l'uniforme, littératie financière).

      Lutte contre l'isolement : L'implication favorise l'intégration sociale et culturelle, surtout pour les nouveaux arrivants.

      Pour le Personnel Scolaire

      Meilleure compréhension culturelle : Les parents aident les enseignants à décoder les comportements des élèves sous un angle culturellement adapté.

      Soutien opérationnel : Le bénévolat parental (ex: accompagnement au musée) enrichit l'expérience pédagogique.

      --------------------------------------------------------------------------------

      4. Diversité, Inclusion et Appartenance

      Une distinction cruciale est faite entre ces trois concepts pour guider l'action parentale :

      1. La Diversité : Un fait statistique (nombres, quotas, pluralité linguistique et culturelle).

      2. L'Inclusion : Un choix individuel et collectif. C'est la volonté d'accueillir et de s'intégrer activement.

      3. L'Appartenance : Le stade ultime, atteint uniquement lorsque les voix des minorités sont intégrées aux discussions et aux processus de décision.

      --------------------------------------------------------------------------------

      5. Exemples d'Impact par l'Engagement Proactif

      La source met en lumière plusieurs cas où l'initiative parentale a transformé l'environnement scolaire :

      Adaptation culturelle : La proposition d'un coin calme pour la prière a permis à un élève de vivre sa foi en sécurité, harmonisant les valeurs de la maison et de l'école.

      Valorisation identitaire : Une séance de lecture de contes et de danses africaines a transformé la perception d'une élève sur ses vêtements traditionnels, passant de la honte à la fierté.

      Innovation curriculaire : L'initiative d'un parent a mené à l'adoption de la littératie financière comme priorité au sein d'un conseil d'école.

      Réorientation stratégique : La proximité entre une mère et une enseignante a permis de rediriger un élève vers un programme plus adapté à son profil (Baccalauréat International), modifiant ainsi sa trajectoire académique.

      --------------------------------------------------------------------------------

      6. Conclusion et Appel à l'Action

      Le document conclut que le manque de temps est souvent une barrière perçue plutôt que réelle. Une heure par mois offerte au conseil d'école peut suffire pour exercer une influence positive.

      Messages clés pour l'avenir :

      • Les parents sont les premiers éducateurs ; l'école fournit l'instruction, les parents fournissent l'éducation.

      • L'implication des parents est le seul moyen efficace pour que le système scolaire connaisse et respecte l'identité des familles qu'il sert.

      • Chaque parent possède un pouvoir d'influence et doit choisir d'être un acteur du changement pour garantir une société pluraliste et enrichie par ses différences.

    1. The following tweet has a video of a soap dispenser that apparently was only designed to work for people with light-colored skin

      This video (which I've seen before) makes me consider how hierarchies get baked not just into the code of the digital world, but the very infrastructure. Its been well documented that the fact that cameras were designed by people with lighter skin made it hard for black people (especially in the age of film rather than digital photography) to find cameras and filmstock that captured their skintones. I wonder if something similar happens with the digital environment, if the lack of coders and designers from the Global South leads to assumptions in digital infrastructure that goes unnoticed and unchallenged.

    1. In addition to laws covering theft, false accusation, and temple robbery (all punishable by death), the code set rules for commerce, slavery, inheritance, and professional accountability (for example, #229: "If a builder builds a house for someone, and does not construct it properly, and the house which he built falls and kills its owner, then that builder shall be put to death."

      This example shows how serious they took their laws for accountability and people that were to be held accountable in their ideology.

    1. AI Doesn’t Reduce Work—It Intensifies It
      • Task Expansion & Role Blurring: AI lowers the barrier to entry for complex tasks, leading employees to take on work outside their core expertise. Product managers and designers are now writing code, while researchers take on engineering tasks.
      • Specialist Burden: This expansion creates a "cleanup" tax. For example, senior engineers now spend significant time reviewing, debugging, and mentoring colleagues who produce "vibe-coded" AI outputs, often through informal and unmanaged channels like Slack.
      • The "Ambient Work" Phenomenon: Because AI interactions feel conversational and "easy," work has become ambient. Employees find themselves prompting AI during lunch, between meetings, or late at night, eliminating natural mental downtime.
      • Intensified Multitasking: Workers are running multiple AI agents in parallel while simultaneously performing manual tasks. This creates a high sense of "momentum" but leads to extreme cognitive load and constant attention-switching.
      • The Productivity Trap: AI acts as a "partner" that makes revived or deferred tasks feel doable. This creates a flywheel where people don't work less; they simply take on more volume, leading to "unsustainable intensity" that managers often mistake for genuine productivity.
      • Sustainability Risks: The researchers warn that while AI feels like "play" initially, it eventually leads to cognitive fatigue, impaired decision-making, and burnout as the quiet increase in workload becomes overwhelming.

      Hacker News Discussion

      • Cognitive Fatigue: Users highlighted that "AI fatigue" is distinct from normal work tiredness. It stems from the "constant vigilance" required to audit AI output and the lack of a "flow state" due to unpredictable waiting times for generations.
      • Executive Function Strain: Commenters noted that managing autonomous agents is more exhausting than manual work. One user compared it to Level 3 autonomous driving—you aren't driving, but you must remain "fully hands-on" to ensure the AI doesn't touch the wrong files or hallucinate.
      • The Jevons Paradox: Several participants pointed out that as the "cost" of work decreases due to AI, the demand for work increases proportionally. Instead of saving time, workers are expected to triple their output, which leaves them more stressed than before.
      • Management Expectations: A common theme was that leadership often mandates AI usage and pre-supposes productivity gains, leaving no room for cases where AI makes work slower or lower quality. This forces employees to "perform" productivity while working longer hours.
      • Vibe Coding vs. Engineering: There is a heated debate between those who see "vibe coding" (prompt-heavy development) as a massive efficiency gain and veterans who argue it produces "average code" that becomes a maintenance nightmare in large, legacy codebases.
    1. eLife Assessment

      This valuable study reports results showing how different neurons in the dysgranular retrosplenial cortex code spatial orientation. Specifically, the paper reports that some neurons maintain tuning for a single head direction across multi-compartmental environments, while other neurons are tuned to different head directions that reflect the geometry within each compartment. The study was viewed as likely to expand the field's understanding of directional tuning of neurons, but incomplete evidence was provided to support the conclusions.

    1. It means to pay attention to the cause-and-effect coding of reality.

      Situational Awareness Checks Faith is not blindness; it is Hyper-Awareness. The "Sakal" operator does not ignore the problem; they decode it. If you are crashing into a wall, "believing harder" will not move the wall. You need to stop and read the code.

      Most Agents pray for the outcome (the win). The KFA protocol is to pray for the mechanic (the insight). This shifts the focus from "God, fix this" to "God, show me how this works."

    1. You cannot out-argue the loop; you must overwrite it with Source Code.

      Deploying the Call Sign In tactical operations, arguing with a broadcast is a waste of resources. You simply change the channel. The "Override Protocol" is a linguistic and physiological shift. By speaking your Call Sign—the truth of your identity—you are deploying a "Cognitive Reframing" technique that bypasses the emotional center of the brain.

      The Mission Directive is simple: Recognition and Replacement. When the HUD (Heads-Up Display) of your mind shows a red alert for "Unloved," you don't ask why you feel unloved. You execute the "Secure Attachment" code. This is not positive thinking; it is a Surgical Strike against a specific neural glitch.

    1. Asa"small"sample,federaltax lawsandregulationsarenow overtenmillionwordslong.6

      One problem that could be contributing to a lack of government revenue is how overly complicated the tax code is. Many coprerations in specific use strategies built in to the tax code to avoid a loss of profit that tends not to be reinvested. (More research required)

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      COMBINED REVIEW REPORTS

      __1.1. The biochemical and biophysical experiments performed in this study were well designed, data were clear and the conclusions were well supported by the results. One potential improvement is to check whether NLS could affect the normal activation targets of ΔNp63α, such as KRT14 and other epithelial genes. This could complement the experiments testing the inhibition effect of ΔNp63α on p53-mediated gene activation. This will be interesting, as ΔNp63α is a master regulator in epithelial cells via regulation of diverse epithelial genes. __

      We thank the Review for such useful comment. In order to further investigate the relationship between p63 nuclear import and function, and the importance of the oligomerization driven tolerance to point mutations in the latter, we have now performed a number of novel experiments. First of all, we have included both DNp63a NLSn and NLSc mutants in DNA binding/p53 -inhibition assays shown in original Figure 7. The new data is shown in Figure 4E and Supplementary Figure__ S5__. As expected, such mutants had a much smaller effect on DNA binding/p53-inhibition as compared to the NLSbip mutant, further establishing a functional link between p63 nuclear levels and transcriptional activity, and proving the functional relevance of the compensatory mechanism evolved by p63 to tolerate the effect of mutations inactivating either NLSn or NLSc.

      In addition, and as specifically suggested by the Reviewer, we have measured the effect of NLS impairing mutations on the ability of DNap63 to transactivate the K14 and the Bax promoters, which. Our results, shown in revised Figure 4F and 4G, as well as in Supplementary Figure S6 clearly show that both DNp63a NLSn and NLSc mutants transactivate the promoters at undistinguishable levels compared to the wild-type, consistent with their minimal effect on DNA binding and nuclear transport, while the NLSbip mutation, which prevents nuclear localization and DNA binding, also prevents transcriptional transactivation.

      __1.2. A minor suggestion: authors could consider use p63 rather than ΔNp63α in the manuscript. The heterogenous sequences of NLS regions are relevant for the delta isoform of p63. In addition, all experiments performed in the study are not necessarily specific for the biology of the ΔNp63α isoform, but they are probably informative for all p63 isoforms. __

      We thank the Reviewer for this suggestion. We have modified the text in the discussion to introduce this concept. Indeed, we expect the bipartite NLS to mediate nuclear transport of most p63 isoforms, whereas the p63 delta isoform, which lacks NLSn, would be transported into the nucleus by NLSc. We have modified the text in the Discussion section to make this point clearer and more explicit "the bipartite NLS identified here is responsible for nuclear localization of most p63 isoforms, while p63 delta is transported into the nucleus by NLSc: SIKKRRSPD)." To further corroborate this statement, we have also included new data obtained with the TAp63a and gNp63a isoforms. Our data clearly show that nuclear import of both isoforms depends on the NLSbip identified here and is mediated by the IMPa/b1 heterodimer, so that the findings obtained for the ΔNp63α isoform can be generalized to others. The new data is shown in Figure 3 and in Supplementary Figure S3.

      __1.3. Another minor suggestion: As p63 forms a tetramer when binding to DNA sequence for gene regulation, it would be good for authors to speculate the role of NLS and its variations in tetramerization. __

      We thank the Reviewer for such comment. Since the NLS is located outside of the tetramerization domain, it is not expected to play a direct role in tetramerization. We have addressed this issue by generating computational models of ΔNp63α and DNp63α;mNLS dimers and tetramers to allow a direct comparison. The new data is shown in Figure 5A-D and Supplementary Figure S11A-D. The data suggests that mutation of the NLS residues, which lies outside of the oligomerizaiton domain, does not affect ΔNp63α oligomerization abilities supporting the experimental evidences from Figure 5E (BRET experiments).

      __

      2.1. In immunofluorescence images it is sometime difficult to see nuclear accumulation. Single channels of the GFP signal may help to make the point. __

      We thank the Reviewer for pointing out this issue. We have provided single channels for every microscopic image in Supplemental Figures.

      __ 2.2. The binding assays in Fig. 3 would profit from using the most efficient imp a variant together with imp beta to show potential cooperative binding.__

      We thank the Reviewer for such comment, which helped enhancing the physiological relevance of our binding data. We have now introduced the requested data in Supplementary Figure S2A. In the revised Figure panel, we compared binding of FITC-labelled p63-NLS peptide to either full length IMPa1 alone, IMPa1DIBB and pre-heterodimerized IMPa1/IMPb1 complex. The data are consistent with a classical binding mode whereby interaction with IMPb1 releases full length IMPa1 binding minor and major binding sites by engaging with the autoinhibitory IBB domain. To corroborate our results even further and demonstrate the bipartite nature of p63 NLS identified here, we have also performed FP experiments between p63-NLS and LTA SV40 NLS (a well characterized monopartite NLS) in the presence of either wt IMPa1DIBB or its minor and major site mutants. As expected from a bipartite NLS, either mutation impaired binding significantly, whereas the mutation of the minor site had a much smaller effect on binding of SV40 LTA NLS. The new data, shown in Supplementary Figure S2BC and Supplementary Table S3 confirm our hypothesis by highlighting a very strong binding affinity reduction of p63 NLS peptide for IMPa1 major site mutant (

      __2.3. please mention that NTR can also recognize 3D structures of structural RNAs, e.g. tRNAs or miRNAs __

      We thank the Reviewer for this very useful suggestion. We have now introduced this concept in the Introduction and added two references to support our statement. The paragraph is as follows: "Additionally, Exportin 5 and Exportin-T evolved to recognize specific RNA structures within pre-miRNAs and t-RNAs, respectively (5, 6)."

      2.4. longer TA isoforms

      We have added corrected the typo and we thank the Reviewer for noticing it.

      __ 2.5. homologues or orthologues? __

      We thank the reviewer for pointing out this issue. We have corrected the text, so now IMPas and members of the p53 family are referred to as paralogs and not as orthologs

      __3.1. The major function of DNp63a seems to be that of a bookmarking factor that ensures the establishment of an epithelial transcriptional program. It is found to bind more to enhancer than to promoter regions. While it might also act for a few genes as a classical transcription factor (K14). this bookmarking and interaction with other transcriptional regulators seems to be its major task. This should be included in the introduction. __

      We thank the Reviewer for this suggestion. The Introduction has been modified as requested to incorporate this important concept "Additionally, p63 has been shown to act as a pioneer factor, shaping the chromatin and enhancer landscape, thus regulating accessibility to activating and repressing transcription factors (18-20)."

      __ 3.2. "DNp63a can be imported into the nucleus as a dimer" What is the evidence that DNp63a is imported as a dimer and not as a tetramer? Although functional not really relevant, because all conclusions drawn for a dimer are true for a tetramer (such as the mutation compensation), this statement (and others in the text) should either be substantiated or modified. __

      The Reviewer is correct in pointing out that, while p63 isoforms bind DNA as tetramers (7), the precise oligomeric state at which nuclear import occurs is not firmly established. Indeed, little is known about the regulation of the p63 oligomerization process during nucleocytoplasmic trafficking. While TA isoforms are generally maintained in an inactive, closed, and dimeric conformation-requiring external stimuli such as phosphorylation to undergo activation and tetramerization-ΔNp63α has been reported to form tetramers even in the absence of such stimuli (4, 8). In light of this, we have modified the text to explicitly acknowledge the possibility that ΔNp63α may be transported into the nucleus either as a dimer or as a tetramer, rather than implying a single obligatory oligomeric state.

      Importantly, to directly address the Reviewer's concern, we have broadened the scope of the manuscript to include additional p63 isoforms, particularly TAp63α, which is predominantly present as a dimer under basal conditions. Our new data (Figure 3) demonstrate that TAp63α is efficiently translocated into the nucleus via the IMPα/β1 heterodimer in an NLSbip-dependent manner. Notably, despite its inability to form tetramers, TAp63α displays a similar tolerance to mutations that inactivate individual basic clusters within the bipartite NLS, analogous to what is observed for ΔNp63α (Supplementary Figure S11).

      Together, these results formally demonstrate that dimerization is sufficient to support efficient nuclear import in the presence of NLS-inactivating mutations, and that higher-order oligomerization (i.e., tetramerization) is not required for this property. We have therefore revised the manuscript accordingly to avoid over-interpretation and to more accurately reflect the experimental evidence.

      __ 3.3. The explanation for the difference in the sensitivity of mutations in the bipartite NLS in the isolated peptide experiments and experiments with the full length DNp63a is intriguing. Unfortunately, it is not based on direct experimental evidence. To proof their model (which is the central claim of this manuscript) they should fuse the bipartite NLS to any dimerization module (e.g. a leucine zipper sequence) and show that by dimerization of the bipartite NLS the same results towards mutations are obtained as for full length DNp63a. This would strongly support their model. __

      We agree that the model for nuclear transport is a central claim of our work, and deserves additional experimental validation. In order to support our hypothesis, in the revised manuscript we have generated a number of additional DNp63a mutants uncapable of self-interaction, based on deletion of residues 301-347(p63-DOD).

      We have now:

      (i) Validated the inability of the DOD mutant to self-interact by means of BRET assays in living cells, whereby a strong decrease in BRET ratio is observed compared to wild-type DNp63a (New Figure 6E and New Supplementary Figure S8).

      (ii) Shown that, in such context, substitution of either the N-terminal or C-terminal basic stretch of amino acids in the NLS is sufficient to impact p63 nuclear import, whereas in the context of the full-length protein, they are not (New Figure 6F-H, and New Supplementary Figure S9).

      (iii) Shown that while FLAG-p63 wt could relocalize to the nucleus YFP-p63mNLSbip but not YFP-p63;DOD;mNLSbip (New Supplementary Figure S10).

      We believe that these new data further demonstrate the impact of p63 self-association on subcellular localization and strongly support our hypothesis. We greatly thank the Reviewer for their inspiring comment, which led to a significant improvement of our manuscript.

      References

      Lotz R, Osterburg C, Chaikuad A, Weber S, Akutsu M, Machel AC, et al. Alternative splicing in the DBD linker region of p63 modulates binding to DNA and iASPP in vitro. Cell Death Dis. 2025;16(1):4. Ciribilli Y, Monti P, Bisio A, Nguyen HT, Ethayathulla AS, Ramos A, et al. Transactivation specificity is conserved among p53 family proteins and depends on a response element sequence code. Nucleic Acids Res. 2013;41(18):8637-53. Monti P, Ciribilli Y, Bisio A, Foggetti G, Raimondi I, Campomenosi P, et al. ∆N-P63alpha and TA-P63alpha exhibit intrinsic differences in transactivation specificities that depend on distinct features of DNA target sites. Oncotarget. 2014;5(8):2116-30. Pitzius S, Osterburg C, Gebel J, Tascher G, Schafer B, Zhou H, et al. TA*p63 and GTAp63 achieve tighter transcriptional regulation in quality control by converting an inhibitory element into an additional transactivation domain. Cell Death Dis. 2019;10(10):686. Okada C, Yamashita E, Lee SJ, Shibata S, Katahira J, Nakagawa A, et al. A high-resolution structure of the pre-microRNA nuclear export machinery. Science. 2009;326(5957):1275-9. Kutay U, Lipowsky G, Izaurralde E, Bischoff FR, Schwarzmaier P, Hartmann E, et al. Identification of a tRNA-specific nuclear export receptor. Mol Cell. 1998;1(3):359-69. Enthart A, Klein C, Dehner A, Coles M, Gemmecker G, Kessler H, et al. Solution structure and binding specificity of the p63 DNA binding domain. Scientific reports. 2016;6:26707. Deutsch GB, Zielonka EM, Coutandin D, Weber TA, Schafer B, Hannewald J, et al. DNA damage in oocytes induces a switch of the quality control factor TAp63alpha from dimer to tetramer. Cell. 2011;144(4):566-76.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Lesser et al provide a comprehensive description of Drosophila wing proprioceptive sensory neurons at the electron microscopy resolution. This “tour-de-force” provides a strong foundation for future structural and functional research aimed at understanding wing motor control in Drosophila with implications for understanding wing control across other insects.

      Strengths:

      (1) The authors leverage previous research that described many of the fly wing proprioceptors, and combine this knowledge with EM connectome data such that they now provide a near-complete morphological description of all wing proprioceptors.

      (2) The authors cleverly leverage genetic tools and EM connectome data to tie the location of proprioceptors on the wings with axonal projections in the connectome. This enables them to both align with previous literature as well as make some novel claims.

      (3) In addition to providing a full description of wing proprioceptors, the authors also identified a novel population of sensors on the wing tegula that make direct connections with the B1 wing motor neurons, implicating the role of the tegula in wing movements that was previously underappreciated.

      (4) Despite being the most comprehensive description so far, it is reassuring that the authors clearly state the missing elements in the discussion.

      Weaknesses:

      (1) The authors do their main analysis on data from the FANC connectome but provide corresponding IDs for sensory neurons in the MANC connectome. I wonder how the connectivity matrix compares across FANC and MANC if the authors perform a similar analysis to the one they have done in Figure 2. This could be a valuable addition and potentially also pick up any sexual dimorphism.

      We agree that systematic comparisons will provide valuable insights as more connectome datasets become available. However, the primary goal of this study was to link central axon morphology with peripheral structures in the wing. We deliberately omitted more detailed and quantitative analyses of the downstream VNC circuitry, apart from providing a global view of the connectivity matrix and using it to cluster the sensory axon types. A more detailed and systematic comparison of wing sensorimotor circuit connectivity across different connectome datasets (FANC, MANC, BANC, IMAC) is the subject of ongoing work in our lab, which we feel is beyond the scope of this study. Here, we chose to match the wing proprioceptors to axons in MANC to demonstrate their stereotypy across individuals and to make them more accessible to other researchers. We found no obvious sexual dimorphism at the level of wing sensory neurons. We now note this in the Discussion.

      (2) The authors speculate about the presence of gap junctions based on the density of mitochondria. I’m not convinced about this, given that mitochondrial densities could reflect other things that correlate with energy demands in sub-compartments.

      We have moved speculation about mitochondria and gap junctions to the Discussion.

      (3) I’m intrigued by how the tegula CO is negative for iav. I wonder if authors tried other CO labeling genes like nompc. And what does this mean for the nature of this CO. Some more discussion on this anomaly would be helpful.

      Based on this suggestion, we have added an image showing that tegula CO neurons are labeled by nompC-Gal4.

      (4) The authors conclude there are no proprioceptive neurons in sclerite pterale C based on Chat-Gal4 expression analysis. It would be much more rigorous if authors also tried a pan-neuronal driver like nsyb/elav or other neurotransmitter drivers (Vglut, GAD, etc) to really rule this out. (I hope I didn’t miss this somewhere.)

      To address this, we imaged OK371-GFP, which labels glutamatergic neurons, in the wing and wing hinge. We saw expression in the wing, as others have reported (Neukomm et. al., 2014), but we saw no expression at the wing hinge. Apart from a handful of glutamatergic gustatory neurons in the leg, we are not aware of any other sensory neurons in the fly that are not labeled by Chat-Gal4.

      Overall, I consider this an exceptional analysis that will be extremely valuable to the community.

      We sincerely appreciate the reviewer’s positive feedback.

      Reviewer #2 (Public review):

      Summary:

      Lesser et al. present an atlas of Drosophila wing sensory neurons. They proofread the axons of all sensory neurons in the wing nerve of an existing electron microscopy dataset, the female adult fly nerve cord (FANC) connectome. These reconstructed sensory axons were linked with light microscopy images of full-scale morphology to identify their origin in the periphery of the wing and encoded sensory modalities. The authors described the morphology and postsynaptic targets of proprioceptive neurons as well as previously unknown sensory neurons.

      Strengths:

      The authors present a valuable catalogue of wing sensory neurons, including previously undescribed sensory axons in the Drosophila wing. By providing both connectivity information with linked genetic drive lines, this research facilitates future work on the wing motor-sensory network and applications relating to Drosophila flight. The findings were linked to previous research as well as their putative role in the proprioceptive and nerve cord circuitry, providing testable hypotheses for future studies.

      Weaknesses:

      (1) With future use as an atlas, it should be noted that the evidence is based on sensory neurons on only one side of the nerve cord. Fruit flies have stereotyped left/right hemispheres in the brain and left/right hemisegments in the nerve cord. The comparison of left and right neurons of the nervous system can give a sense of how robust the morphological and connectivity findings are. Here, the authors have not compared the left and right side sensory axons from the wing nerve, leaving potential for developmental variability across samples and left/right hemisegments.

      The right ADMN nerve in the FANC dataset is partially severed, making left/right comparisons unreliable (see Azevedo 2024, Extended Data Figure 4). We have updated the text to explain this within the Methods section of the paper.

      (2) Not all links between the EM reconstructions and driver lines are convincing. To strengthen these, for all EM-LM matches in Figures 3-7, rotated views of the driver line (matching the rotated EM views) should be shown to provide a clearer comparison of the data. In particular, Figure 3G and Figure 7B are not very convincing based on the images shown. MCFO imaging of the driver lines in Figure 3G and 7B would make this position stronger if a clone that matches the EM reconstruction could be identified.

      Many of the z-stack images in the paper are from the Janelia FlyLight collection, and unfortunately their imaging parameters were not optimized for orthogonal views. Rotated views are blurry and not especially helpful for comparison to EM reconstruction. We now point out in the text that interested readers can access the z-stacks from FlyLight to see the dorsal-ventral projections.

      Regarding Figure 3G and 7B, we have added markers to the image with corresponding descriptions in the legend to guide the reader through the image of the busy driver line. Although these lines label many cells in the VNC as a whole, they sparsely label cells in the ADMN, making them nonetheless useful for identifying peripheral sensory neurons.

      (3) Figure 7B looks like the driver line might have stochastic expression in the sensory neuron, which further reduces confidence in the result shown in Figure 7C. Is this expression pattern in the wing consistently seen? Many split-GAL4s have stochastic expressions. The evidence would be strengthened if the authors presented multiple examples (~4-5) of each driver line’s expression pattern in the supplement.

      Figure 7B shows sparse labeling of the driver line using the MCFO technique, as specified in the legend. Its unilateral expression is therefore not due to stochastic expression of the Gal4 line. We have added the “MFCO” label to the image to clarify.

      (4) Certain claims in this work lack quantitative evidence. On line 128, for instance, “Overall, our comprehensive reconstruction revealed many morphological subgroups with overlapping postsynaptic partners, suggesting a high degree of integration within wing sensorimotor circuits.” If a claim of subgroups having shared postsynaptic partners is being made, there should have been quantitative evidence. For example, cosine similar amongst members of each group compared to the cosine similarity of shuffled/randomised sets of axons from different groups. The heat map of cosine similarity in Figure 2B alone is not sufficient.

      We agree that illustrating the extent of shared postsynaptic partners across subgroups strengthens this point. We added a visualization showing pairwise similarity scores for within- and between-cluster neuron pairs (Figure 2B inset). We also performed a permutation test to determine that within-cluster similarity is significantly higher than between clusters, and we report the test in the results as well as the figure legend. This analysis provides a more quantitative summary of the qualitative trends in connectivity that are summarized in Figure 2B.

      (5) Similarly, claims about putative electrical connections to b1 motor neurons are very speculative. The authors state that “their terminals contain very densely packed mitochondria compared to other cells”, without providing a quantitative comparison to other sensory axons. There is also no quantitative comparison to the one example of another putative electrical connection from the literature. Further, it should be noted that this connection from Trimarchi and Murphey, 1997, is also stated as putative on line 167, which further weakens this evidence. Quantification would strongly strengthen this position. Identification of an example of high mitochondrial density at a confirmed electrical connection would be even better. In the related discussion section “A potential metabolic specialization for flight circuitry”, it should be more clearly noted that the dense mitochondria could be unrelated to a putative electrical connection. If the authors have an alternative hypothesis about the mitochondria density, this should be stated as well.

      We agree with the reviewer that the link between mitochondrial density and metabolic specialization is purely speculative in this context. Based on reviewer feedback, we have moved all mention of the relationship between mitochondrial density and gap junction coupling to the Discussion. We acknowledge that this may seem like a somewhat random and not quantitatively supported observation. However, we found the coincidence striking and worthy of mention, though it is only tangentially relevant to the rest of the paper. From conversations with colleagues, we have also heard that this relationship is consistent with as yet unpublished work in other model organisms (e.g., zebrafish, mouse).

      The electrical coupling to b1 motor neurons is well-established (Fayyazuddin and Dickinson, 1999), and we have updated the text to state this more clearly. However, we agree that whether the specific neurons we have identified based on their anatomy are the same ones functionally identified through whole-nerve recordings remains unknown.

      (6) It would be appropriate to cite previous work using a similar strategy to match sensory axons to their cell bodies/dendrites at the periphery using driver lines and connectomics (see Figure 5 for example in the following paper: https://doi.org/10.7554/eLife.40247 ).

      At this point, there are now dozens of papers that match the axons of sensory neurons to their cell bodies/dendrites in the periphery by comparing light microscopy and connectomics. When we dug in, we found examples in C. elegans, Ciona intestinalis, zebrafish, and mouse, all published prior to the study cited above. For basically every animal for which scientists have acquired EM volumes of neural tissue, they have used other anatomical labeling methods to determine cell types inside and outside the imaged volume. In summary, we found it difficult to establish a single primary citation for this approach. In lieu of this, we have added a citation to an earlier review by a pioneer in EM connectomics that discusses the general approach of matching cells across different labeling/imaging modalities (Meinertzhagen et al., 2009).

      The methods section is very sparse. For the sake of replicability, all sections should be expanded upon.

      We have expanded the methods section, and also a STAR methods table.

      Reviewer #3 (Public review):

      Summary:

      The authors aim to identify the peripheral end-organ origin in the fly’s wing of all sensory neurons in the anterior dorsomedial nerve. They reconstruct the neurons and their downstream partners in an electron microscopy volume of a female ventral nerve cord, analyse the resulting connectome, and identify their origin with a review of the literature and imaging of genetic driver lines. While some of the neurons were already known through previous work, the authors expand on the identification and create a near-complete map of the wing mechanosensory neurons at synapse resolution.

      Strengths:

      The authors elegantly combine electron microscopy, neuron morphology, connectomics, and light microscopy methods to bridge the gap between fly wing sensory neuron anatomy and ventral nerve cord morphology. Further, they use EM ultrastructural observations to make predictions on the signaling modality of some of the sensory neurons and thus their function in flight.

      The work is as comprehensive as state-of-the-art methods allow to create a near-complete mapof the wing mechanosensory neurons. This work will be of importance to the field of fly connectomics and modelling of fly behavior, as well as a useful resource to the Drosophila research community.

      Through this comprehensive mapping of neurons to the connectome, the authors create a lot of hypotheses on neuronal function, partially already confirmed with the literature and partially to be tested in the future. The authors achieved their aim of mapping the periphery of the fly’s wing to axonal projections in the ventral nerve cord, beautifully laying out their results to support their mapping.

      The authors identify the neurons in a previously published connectome of a male fly ventral nerve cord to enable cross-individual analysis of connections. Further, together with their companion paper, Dhawan et al. 2025, describing the haltere sensory neurons in the same EM dataset, they cover the entire mechanosensory space involved in Drosophila flight.

      Weaknesses:

      The connectomic data are only available upon request; the inclusion of a connectivity table of the reconstructed neurons would aid analysis reproducibility and cross-dataset comparisons.

      We have added a connectivity table as well as analysis scripts in the github repository for the paper (https://github.com/EllenLesser/Lesser_eLife_2025).

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The methods section should be expanded in every aspect. Most pressing sections are:

      (1) Data and Code availability: All code should be included as a Zenodo database, the suggestion to ask authors for code upon request is inappropriate.

      We have added all code to a public github repository, which is now linked in the Methods section.

      (2) Samples: Standard cornmeal and molasses medium should have a reference, as many institutes use different recipes.

      The recipe used by the University of Washington fly kitchen is based on the Bloomington standard Cornmeal, Molasses and Yeast Medium recipe, which can be found at https://bdsc.indiana.edu/information/recipes/molassesfood.html. The UW recipe is slightly modified for different antifungal ingredients and includes tegosept, propionic acid, and phosophoric acid.

      (3) Table 3: Driver lines labelling wing sensory neurons: The genetic driver lines should have associated Bloomington stock centre numbers. Additionally, relevant information for effector lines used should be included in the methods.

      We now include the Bloomington stock numbers and more information on effector lines in the STAR methods table.

      Minor corrections:

      (1) Lines 119-120: “Notably, many of the axons do not form crisp cluster boundaries, suggesting that multimodal sensory information is integrated at early stages of sensory processing.” We do not follow the logic of this statement and suspect it is a bit too speculative.

      We removed this sentence from the manuscript.

      (2) Figure 1: The ADMN is missing in the schematics and would be helpful to depict for non-experts. Is this what is highlighted in Figure 1D?

      Yes, and we now label 1D as the ADMN wing nerve.

      (3) Figure 1B: Which driver lines are being depicted here? Looking at Table 3 does not clarify. It should be specified at least in the figure legend.

      As stated in the legend, we include a table of all of the driver lines we screened and which sensory structures they label.

      (4) Figure 1C: There are some minor placement issues with the text in the schematic. There is an arrow very close to the “CO” on the top right, which makes the “O” look like the symbol for male. “ax ii” is a bit too close to the wing hinge

      We updated the figure to address this issue.

      (5) Figure 1D: The outlined grey masks are not clear. The use of colour would be very useful for the reader to help understand what the authors are referring to here

      We now use color for the masks.

      (6) Figure 2A: It is unclear if the descending neuron and non-motor efferent neuron are not shown because they are under the described threshold, or to simplify the plot. They should be included in the plot if over the threshold.

      We have updated the legend to specify that the exclusion of the descending and non-motor efferent neurons are to visually simplify the plot. We include % of sensory output to each of these neurons in the legend, and they are included in the connectivity matrix data in the public  GitHub repository associated with the paper, included in the Methods.

      (7) Figure 2B: What clustering is used specifically? The method says it’s from Scikit-learn, but there are many types of clustering available in this package.

      We now include the specific clustering type used in the Methods section, which is agglomerative clustering.

      (8) Figure 3A: What does the green box behind the plot represent?

      The green box represents the tegula CO axons, which we now specify in the legend.

      (9) Figure 3C: the “C” is clipped at the top.

      We updated the figure to address this issue.

      (10) Figure 4A: the main text says a “group of four axons” (line 203) while the figure says 5 axons.

      We updated the text to address this issue.

      (11) Line 360: “We found that the campaniform sensilla on the tegula provide the most direct feedback onto wing steering motor neurons”. We struggled to find where this was directly shown, because several sensory axon types directly synapse onto motor neurons.

      We now specify in the text that this finding is shown in Figure 3.

      Reviewer #3 (Recommendations for the authors):

      I would like to congratulate the authors on their beautiful, easy-to-read, and easy-to-comprehend manuscript, with clear figures and nice visualizations. This work provides a valuable resource that will contribute to the interpretability of connectomic data and further to connectome-based modeling of fly behavior.

      We sincerely appreciate the reviewer’s positive feedback.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Evidence, reproducibility and clarity

      Authors should be commended for the availability of data/code and detailed methods. Clarity is good. Authors have clearly spent a lot of time thinking about the challenges of metabolomics data analysis.

      Significance

      Schmidt et al. present MetaProViz, a comprehensive and modular platform for metabolomics data analysis. The tool provides a full suite of processing capabilities spanning metabolite annotation, quality control, normalization, differential analysis, integration of prior knowledge, functional enrichment, and visualization. The authors also include example datasets, primarily from renal cancer studies, to demonstrate the functionality of the pipeline. The MetaProViz framework addresses several long-standing challenges in metabolomics data analysis, particularly issues of reproducibility, ambiguous metabolite annotation, and the integration of metabolite features with pathway knowledge. The platform is likely to be a valuable addition for the community, but the reviewer has some comments that need to be addressed prior to publication.

      We thank the reviewer for this positive feedback.

      Comments:

      (1) (Planned)

      The section "Improving the connection between prior knowledge and metabolomics features" could benefit from additional clarification. It is not entirely clear to the reader what specific steps were taken beyond using RaMP-DB to translate metabolite identifiers. For example, how exactly were ambiguous mappings ("different scenarios") handled in practice, and to what extent does this process "fix" or merely flag inconsistencies? A more explicit description or example of how MetaProViz resolves these cases would help readers better understand the improvements claimed.

      We thank the reviewer for pointing this out and we agree that this section requires extension to ensure clarity. Beyond using RaMP-DB, we are characterising the mapping ambiguity (one-to-none, one-to-many, many-to-one, many-to-many) within and across metabolite-sets (i.e. pathways) and return this information to the user together with the translated identifiers. This is important to understand potential inflation/deflation of metabolite-sets that occur due to the translation. Moreover, we also offer the manually curated amino-acid collection to ensure L-, D- and zwitterion without chirality IDs are assigned for aminoacids (Fig. 2b). Ambiguous mappings are handled based on the measured data (Fig. 2e). Indeed, many translation cases that deflate (many-to-one mapping) or inflate (one-to-many mapping) the metabolite-sets are resolved when merging the prior knowledge with actual measured data (i.e. Fig. 2e, one-to-many in scenario 1, which becomes obsolete as only one/none of the many potential metabolite IDs is detected). By sorting each mapping into one of those scenarios, we only flag those cases. The reason for this decision has been that in many cases multiple decisions are valid (i.e. Fig. 2e, Scenario 5: Here the values of the two detected metabolites could be summed or the metabolite value with the larger Log2FC could be kept) and it should really be up to the user to make those dependent on their knowledge of the biological system and the analytical LC-MS method used.

      Since these points have not been clear enough, we will add a more explicit description to the results section by showcasing more details on how we exactly tackled this problem in the ccRCC example data. This has also been suggested by Reviewer 3 (Minor Comment 7 and 8), so feel free to also see the responses below.

      (2) (Planned)

      The introduction of MetSigDB is intriguing, but its construction and added value are not sufficiently described. It would be helpful to clarify what specific advantages MetSigDB provides over directly using existing pathway resources such as KEGG, Reactome, or WikiPathways. For example, how many features, interactions, or metabolite-set relationships are included, and in what way are these pathways improved or extended compared to those already available in public databases?

      We thank the reviewer for this valuable comment and we apologise that this was not described sufficiently. One of the major advantages is that all the resources are available in one place following the same table format without the need to visit the different original resources and perform data wrangling prior to enrichment analysis. In addition, where applicable, we have removed metabolites that are not detectable by LC-MS (i.e. ions, H2O, CO2) to circumvent pathway inflation with features that are never within the data and hence impacting the statistical testing in enrichment analysis workflows.

      During the revision, we will compile an Extended Data Table listing all the resources present in MetSigDB, their number of features and interactions. We will also extend the methods section "Prior Knowledge access" about MetSigDB and how we removed metabolites.

      (3)

      Figure 1D/1E: The reviewer appreciates the inclusion of the visualizations illustrating the different mapping scenarios, as these effectively convey the complexity of metabolite ID translation. However, it took some time to interpret what each scenario represented. It would be helpful to include brief annotations or explanatory text directly on the figures to clarify what each scenario depicts and how it relates to the underlying issue being addressed.

      *We think the reviewer refers to Fig. 2D/E and we acknowledge that this is a complex problem we try to convey. We received a similar comment from Reviewer 2 (Minor Comment 1), who asked to extend the figure legend description of what the different scenarios display. *

      We have extended the figure legend and specifically explained each displayed case and its meaning (Line 222-242):

      "d-e) Schematics of possible mapping cases between metabolite IDs (= each circle corresponds to one ID) of a pathway-metabolite set (e.g. KEGG) to metabolites IDs of a different database (e.g. HMDB) with (d) showing many-to-many mappings that can occur within and across pathway-metabolite sets and (e) additionally showing the mapping to metabolite IDs that were assigned to the detected peaks within and across pathway-metabolite sets. (d) __Translating the metabolite IDs of a pathway-metabolite set can lead to special cases such as many-to-one mappings (Pathway 1), where for example the original resource used the ID for L-Alanine (Pathway 1, green) and D-Alanine (Pathway 1, yellow) in the amino-acid pathway, whilst the translated resources only has an entry for Alanine zwitterion (Pathway 1, blue). Additionally, many-to-one mappings can also occur across pathways (Pathway 2-4), where this mapping is only detected when mappings are analysed taking all pathways into account. Both of these cases deflate the pathways, which can also happen for one-to-none mappings (Pathway 1, white). There are also cases that inflate the pathway such as one-to-many mappings (e.g. Pathway 2-4, orange mapping to pink and violet). (e)__ Showcasing the different scenarios when merging measured data (detected) based on the translated metabolites within pathways (scenario 1-5) and across pathways (scenario 6-8) highlighting problematic scenarios (4-7) that require further actions. Unproblematic scenarios (1-3 and 8) can include special cases between original and translated (i.e. one-to-many in scenario 1), which become obsolete as only one/none of the many potential metabolite IDs is detected. Yet, if multiple metabolites are detected action is required (scenario 5), which can include building the sum of the multiple detected features or only keeping the one with the highest Log2FC between two conditions. Other special cases between original and translated (i.e. many-to-one in scenario 4 and 6) also depend on what has been mapped to the measured features. If features have been measured in those scenarios, pathway deflation (i.e. only one original entry remains) or measured feature duplication (the same measurement is mapped to many features in the prior knowledge) are the possible results within and across pathways. Those scenarios should be addressed on a case-by-case basis as they also require biological information to be taken into account."

      We have also rearranged the Scenarios in Fig. 2e. We hope that together with the extended figure legend this is now clear.

      (4) (Planned)

      "By assigning other potential metabolite IDs and by translating between the present ID types, we not only increase the number of features within all ID types but also increase the feature space with HMDB and KEGG IDs (Fig. 2a, right, SFig. 2 and Supplementary Table 1)". The reviewer would appreciate additional clarification on how this was done. It is not clear what specific steps or criteria were used to assign additional metabolite IDs or to translate between identifier types. The reviewer also appreciates the inclusion of the UpSet plots. However, simply having the plots side-by-side makes it difficult to determine the specific differences. An alternative visualization, such as stacked bar plots, scatter plots summarizing the changes in feature counts, or other representation that more clearly highlights the deltas, might make these results easier to interpret.

      The main Fig. 2a shows the original (left) metabolite ID availability per detected metabolite feature in the ccRCC data and the adapted (right) metabolite IDs. The individual steps taken to extend the metabolite ID coverage of the measured features and obtain Fig 2a (right), are shown in SFig. 2 for HMDB (SFig. 2a) and KEGG (SFig. 2b). We did not include the plots for the pubchem IDs as they follow the same principle. The individual steps we are showcasing with SFig. 2 are (I) How many of the detected features (577) have a HMDB ID (341, red bar + grey bar), (II) How this distribution changed after equivalent amino-acid IDs are added, which does not change the number of features with an HMDB ID, but the number of features with a single HMDB ID, and (III) How this distribution changed after translating from the other available ID types (KEGG and PubChem) to HMDB IDs using RaMP-DBs knowledge, which leads to 430 detected features with one or multiple HMDB IDs. The exact numbers can be extracted from Supplementary Table 1, Sheet "Feature metadata", where for example N-methylglutamate had no HMDB ID assigned in the original publication (see column HMDB_Original), yet by translating HMDB from KEGG (hmdb_from_kegg) and PubChem (see column hmdb_from_pubchem) we obtain in both cases the same HMDB ID "HMDB0062660". In order to clarify this in the manuscript, we have extended the figure legend of SFig. 2: "a-b) Bargraphs showing the frequency at which a certain number of metabolite IDs per integrated peak are available as per ccRCC patients feature metadata provided in the original publication (left), after potential equivalent IDs for amino-acid and amnio-acid-related features were assigned (middle), which increases the number of features with multiple (middle: grey bars) and after IDs were translated from the other available ID types (right). for a) Of 577 detected features, 341 had at least one HMDB IDs assigned (left graph, red + grey bar) according to the original publication (left). Translating from KEGG-to-HMDB and from PubChem-to-HMDB increased the number of features with an HMDB ID from 341 to 430 (left). and __b) __Of 577 detected features, 306 had at least one KEGG IDs assigned (left graph, red + grey bar) according to the original publication (left). Translating from HMDB-to-KEGG and from PubChem-to-KEGG did not increase the total number of features with an KEGG ID (left)."

      We like the suggestion of the reviewer to provide representations of the deltas and will add additional plots to SFig. 2 as part of our planned revision.

      (5) (Planned)

      MetaboAnalyst is mentioned several times in the manuscript. The reviewer is familiar with some of the limitations and practical challenges associated with using MetaboAnalyst and its R package. Given that MetaboAnalyst already offers some overlapping functionality with MetaProViz (and offers it in the form of an interactive website and a sometimes functional R package), a more explicit comparison between the two tools would help readers fully understand the unique advantages and improvements provided by MetaProViz.

      This is a good point the reviewer raises. As part of the revisions, we plan to create a supplementary data table that includes both tools and their respective features. We will refer to this table within the manuscript text.

      (6)

      Page 11: The authors state that they used limma for statistical testing, including for the analysis of exometabolomics data, where the values appear to represent log2-transformed distances or ratios rather than normally distributed intensities. Since limma assumes approximately normal residuals, please provide evidence or justification that this assumption holds for these data types. If the distributions deviate substantially from normality, a non-parametric alternative might be more appropriate.

      For exometabolomics data we use data normalised to media blank and growth factor (formula (1)). Limma is performed on those data, not on the log2-transformed distances. The Log2(Distance) is calculated separately to the statistical results using the normalised exometabolomics data. In addition, we always perform the Shapiro-Wilk test as part of MetaProViz differential analysis function on each metabolite to understand the distribution. In this particular case we have the following distributions:

      Cell line

      Metabolites normal distribution [%]

      Metabolites not-normal distribution [%]

      HK2

      82.35

      17.65

      786-O

      95.71

      4.29

      786-M1A

      97.14

      2.86

      786-M2A

      88.57

      11.43

      OSRC2

      92.86

      7.14

      OSLM1B

      85.71

      14.29

      RFX631

      97.14

      2.86

      If a user would have distributions that deviate substantially from normality, non-parametric alternatives are also available in MetaProViz (see methods section for all options).

      7)

      Page 13: why were young and old defined this way? Authors should provide their reasoning and/or citations for this grouping.

      We thank the reviewer for pointing this out. The explanation of our choices of the age groups is purely based on the literature:

      First, ccRCC can be sporadic (>96%) or familial (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3308682/pdf/nihms362390.pdf). This was also observed in other cohorts, where of 1233 patients only 93 were under 40 years of age (%, whilst 1140 (%) were older than 40 years (https://www.europeanurology.com/article/S0302-2838(06)01316-9/fulltext). Second, given the high frequency of sporadic cases it is unsurprising that ccRCC incidences were found to peak in patients aged 60 to 79 years with more male than female incidences (https://journals.lww.com/md-journal/Fulltext/2019/08020/Frequency,_incidence_and_survival_outcomes_of.49.aspx). Third, it was shown that sex impacts on the renal cancer-specific mortality and is modified by age, which is a proxy for hormonal status with premenopausal period below 42 years and postmenopausal period above 58 years (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4361860/pdf/srep09160.pdf). Putting all of this information together, we decided on our age groups of young (58years) following the hormonal period in order to account for sex impact. Additionally, our young age group is representative of the age of familial ccRCC, whilst our old age group summarises the age group where incidences were found to peak.

      To make this clear in the manuscript we have extended the method section of the manuscript (Line 547-548):

      "For the patient's ccRCC data, we compared tumour versus normal of two patient subset, "young" (58years)."

      (8)

      Figure 4e: It may help with interpretation to have these Sankey-like graph edges be proportional to the number of metabolites.

      We thank the reviewer for this suggestion, which we also pondered. When we tested this visualisation, the plot became convoluted, hard to interpret and not all potential flows exist in the data. This is why we have opted to create an overview graph of each potential flow, with each edge representing a potentially existing flow. The number of times a flow exists is shown in Fig. 4f.

      (9)

      Figure 4h: The values appear to be on an intensity scale (e.g., on the order of 3e10), yet some of them are negative, which would not be expected for raw or log-transformed mass spectrometry intensities. It is unclear whether these represent normalized abundance values, distances, or some other transformation. In addition, for the comparison of tumour versus normal tissue, it is not specified what statistical test was applied. Since mass spectrometry data are typically log2-transformed to approximate a log-normal distribution before performing t-tests or similar parametric methods, clarification is needed on how these data were processed.

      Thanks for pointing this out, it made us realize that we need to extend our figure legend for clarity for Fig. 4h (Line 343-345). In both cases we show normalized intensities following the workflow described in Fig. 3a. In case of the left graph labelled "CoRe", we are plotting an exometabolomics experiment, were additionally normalised using both media blanks (samples where no cells were cultured in) and growth factor (accounts for cell growth during experiment) as growth rate (accounts for variations in cell proliferation) has not been available (see also formula (1) in methods section). A result has a negative value if the metabolite has been consumed from the media, or a positive value if the metabolite has been released from the cell into the culture media.

      In addition, the reviewer refers to the comparison of tumour versus normal (Fig. 4a __and 4d__) and the missing description of the chosen statistical test. We have added the details to the figure legend (Lines 334 and 345).

      Adapted legend Fig. 4: "a) Differential metabolite analysis results for exometabolomics data comparing 786-O versus HK2 cells using Annova and false discovery rate (FDR) for p-value adjustment. b) __Heatmap of mean consumption-release of the measured metabolites across cell lines. c) Heatmap of normalised ccRCC cell line exometabolomics data for the selected metabolites of amino acid metabolism for a sample subset. __d) __Differential metabolite analysis results for intracellular data comparing 786-O versus HK2 cells using Annova and false discovery rate (FDR) for p-value adjustment. __e) __Schematics of bioRCM process to integrate exometabolomics with intracellular metabolomics and __f) __number of metabolites by their combined change patterns in intracellular- and exometabolomics in 786-M1A versus HK2. g)__ Heatmap of the metabolite abundances in the "Both_DOWN (Released/Comsumed)" cluster. __h) __Bar graphs of normalised methionine intensity for exometabolomics (CoRe: negative value, if the metabolite has been consumed from the media, or a positive value, if the metabolite has been released from the cell into the culture media) and intracellular metabolomics (Intra)."


      (10)

      Figure 5: "Tukey's p.adj We thank the reviewer for pointing this out. We have used the TukeyHSD (Tukey's Honestly Significant Difference) test in R on the Anova results. We have added more details into the figure legend (Line 384): "(Tukey's post-doc test after anova p.adj<br /> (11)

      The potential for multi-omics is mentioned. Please clarify how generalizable this framework is. Can it readily accommodate transcriptomics, proteomics, or fluxomics data, or does it require custom logic or formatting for each new data type?

      Thanks for raising this question. MetaProViz can readily accommodate transcriptomics and proteomics data for combined enrichment analysis using for example MetalinksDB metabolite-receptor pairs. Yet, MetaProViz does not support modelling fluxomics data into metabolic networks. We state in the discussion that this could be future development ("Beyond current capabilities, future developments could also incorporate mechanistic modeling to capture metabolic fluxes, subcellular compartmentalization, enzyme kinetics, regulatory feedback loops, and thermodynamic constraints to dissect metabolic response under perturbations."). To clarify on the availability of multi-omics integration for combined enrichment analysis, we have added some more details into the discussion section.

      Line 467-469: "In addition, providing knowledge of receptor-, transporter- and enzyme-metabolite pairs, MetaProViz can readily accommodate transcriptomics and proteomics data for combined enrichment analysis."

      (12)

      Please clarify if/how enrichment analyses account for varying set sizes and redundant metabolite memberships across pathways, which can bias over-representation analysis results.

      This is a very relevant point, which we have already been working on. Indeed, we agree that enrichment results from enrichment analyses can be biased due to varying set sizes and redundant metabolite memberships across pathways. MetaProViz explicitly accounts for varying set sizes when running over representation analysis (functions standard_ora()and cluster_ora()), which uses a model that computes the p-value under a hypergeometric distribution. Thereby, larger pathways are penalized unless the overlap is proportionally large, while smaller pathways can be significant with fewer overlaps. Hence, the test quantifies whether the observed overlap between the query set and a pathway is larger than would be expected under random sampling. In addition, we explicitly filter by gene‑set size using min_gssize/max_gssize, which further controls for extreme small or large sets. So both the statistical test itself and the size filters incorporate gene‑set size variation.

      Regarding the redundant metabolite-set (i.e. pathways) memberships, we have now implemented a new function (cluster_pk()) to cluster metabolite-sets like pathways based on overlapping metabolites. Thereby we allow investigation of enrichment results in regard to redundancy and similarity. For given metabolite-sets, the function calculates pathway similarities via either overlap- or correlation-based metrics. After optional thresholding to remove weak similarities, we implemented three clustering algorithms (connected-components clustering, Louvain community detection and hierarchical clustering) to group similar pathways. We then visualize the clustering results as a network graph using the new function viz_graph based on igraph. We have added all information into our methods section "Metabolite-set clustering" (Lines 656-671). In addition, we have also added the results of the clustering into Fig. 5f.

      New Fig. 5f:"f) *Network graph of top enriched pathways (p.adjusted

      Reviewer #2

      Evidence, reproducibility and clarity

      Schmidt et al report the development of MetaProViz, an integrated R package to process, analyze and visualize metabolomics data, including integration with prior knowledge. The authors then go on to demonstrate utility by analyzing several metabolomes of cell lines, media and patient samples from kidney cancer. The manuscript provides a concise description of key challenges in metabolomics that the authors identify and address in their software. The examples are helpful and illustrative, although I should point out that I lack the expertise to evaluate the R package itself. I only have a few very minor comments.

      Significance

      This is a very significant advance from one of the leading groups in the field that is likely to enhance metabolomics data analysis in the wider community.

      We thank the reviewer for this positive feedback on our package. We appreciate that there are no major comments from the reviewer.

      Minor comments:

      (1)

      Figure 2D, E: While the schematics are fairly intuitive, a brief figure legend description of what the different scenarios etc. represent would make this easier to grasp.

      We thank the reviewer for pointing this out and we acknowledge that this is a complex problem we try to convey. We received a similar comment from Reviewer 1 (Comment 3), so please see the extensive response there. In brief, we have extended the figure legend and specifically explained each displayed case and its meaning (Line 222-242) and extended the Figure itself by adding additional categories to Fig. 2e.

      Extended legend Fig.2 d-e: "d-e) Schematics of possible mapping cases between metabolite IDs (= each circle corresponds to one ID) of a pathway-metabolite set (e.g. KEGG) to metabolites IDs of a different database (e.g. HMDB) with (d) showing many-to-many mappings that can occur within and across pathway-metabolite sets and (e) additionally showing the mapping to metabolite IDs that were assigned to the detected peaks within and across pathway-metabolite sets. (d) __Translating the metabolite IDs of a pathway-metabolite set can lead to special cases such as many-to-one mappings (Pathway 1), where for example the original resource used the ID for L-Alanine (Pathway 1, green) and D-Alanine (Pathway 1, yellow) in the amino-acid pathway, whilst the translated resources only has an entry for Alanine zwitterion (Pathway 1, blue). Additionally, many-to-one mappings can also occur across pathways (Pathway 2-4), where this mapping is only detected when mappings are analysed taking all pathways into account. Both of these cases deflate the pathways, which can also happen for one-to-none mappings (Pathway 1, white). There are also cases that inflate the pathway such as one-to-many mappings (e.g. Pathway 2-4, orange mapping to pink and violet). (e)__ Showcasing the different scenarios when merging measured data (detected) based on the translated metabolites within pathways (scenario 1-5) and across pathways (scenario 6-8) highlighting problematic scenarios (4-7) that require further actions. Unproblematic scenarios (1-3 and 8) can include special cases between original and translated (i.e. one-to-many in scenario 1), which become obsolete as only one/none of the many potential metabolite IDs is detected. Yet, if multiple metabolites are detected action is required (scenario 5), which can include building the sum of the multiple detected features or only keeping the one with the highest Log2FC between two conditions. Other special cases between original and translated (i.e. many-to-one in scenario 4 and 6) also depend on what has been mapped to the measured features. If features have been measured in those scenarios, pathway deflation (i.e. only one original entry remains) or measured feature duplication (the same measurement is mapped to many features in the prior knowledge) are the possible results within and across pathways. Those scenarios should be addressed on a case-by-case basis as they also require biological information to be taken into account."

      (2) Fig. 4: The authors briefly state that they integrate prior knowledge to identify the changes in methionine metabolism in kidney cancer, but it is not clear how exactly they contribute to this conclusion. It could be helpful to expand a bit on this to better illustrate how MetaProViz can be used to integrate prior knowledge into the analysis workflow.

      We think the reviewer refers to this section in the text (Line 363-370):

      "Next, we focused on the cluster "Both_DOWN (Released-Consumed)" and found that several amino acids are consumed by the ccRCC cell line 786-M1A but released by healthy HK2 cells. At the same time, intracellular levels are significantly lower than in HK2 (Log2FC = -0.9, p.adj = 4.4e-5) (Fig. 4g). To explore the role of these metabolites in signaling, we queried the prior knowledge resource MetalinksDB, which includes metabolite-receptor, metabolite-transporter and metabolite-enzyme relationships, for their known upstream and downstream protein interactors for the measured metabolites (Supplementary Table 5). This approach is especially valuable for exometabolomics, as it allows us to generate hypotheses about cell-cell communication. Notably, we identified links involving methionine (Fig. 4h), enzymes such as BHMT, and transporters such as SLC43A2 that were previously shown to be important in ccRCC25,42 (Supplementary Table 5)."

      We have now extended this part to clearly state that here MetalinkDB is the prior knowledge resource we used to identify the links for methionine (Line 363-364). In addition we have extended our summary statement to ensure clarity for the reader that we combine the biological clustering, which revealed the amino acid changes, with prior knowledge for the mechanistic insight (Line 380-381):

      "In summary, calculating consumption-release and combining it with intracellular metabolomics via biological regulated clustering reveals metabolites of interest. Further combining these results with prior knowledge using the MetaproViz toolkit facilitates biological interpretation of the data."

      (3)

      Given the functional diversity among metabolites -central to diverse pathways, are key signaling molecules, restricted functions, co-variation within a pathway - I wonder how informative approaches such as PCA or enrichment analyses are for identifying metabolic drivers of a (patho)physiological state. To some extent, this can be addressed by integrating prior knowledge, and it would be helpful if the authors could comment on (and if applicable explain) whether/how this is integrated into MetaProViz.

      The reviewer is correct in stating the functional diversity of metabolites, which is also why prior knowledge is needed to add mechanistic interpretation to the finding from the metadata analysis (as we showcased by focusing on the separation of age (Fig. 5c-d)). We think that approaches such as PCA or enrichment can be helpful, even if admittedly limited. For example, in the metadata analysis presented in Fig. 5b and the subsequent enrichment analysis presented in Fig. 5, we used PCA to extract the eigenvector and the loading, which act as weights indicating the contribution of each original metabolite to that specific principal components separation. Hence, the eigenvector of PCA shows the metabolite drivers of the separation. This does not necessarily mean that those metabolites are drivers of a (patho)physiological state - the (patho)physiological state can equally be the reason for those metabolites driving the separation on the Eigenvectors. Thus, the metadata analysis presented in Fig. 5b enables us to extract the metadata variables (patho)physiological states separated on a PC with the explained variance. This can also lead to co-variation, when multiple (patho)physiological states are separated on the same PC, as the reviewer correctly points out. Regarding the enrichment analysis, we provide different types of prior knowledge for classical mapping, but also the prior knowledge we used to create the biological regulated clustering, which together help to identify key metabolic groups as we can first cluster the metabolites and afterwards perform functional enrichment. Yet, this does not account for the technical issues of enrichment analysis. In this context multi-omics integration building metabolic-centric networks could further elucidate the diversity of metabolic pathways and connection to signalling and co-variation, yet this is not the scope of MetaProViz. To sum up, we are aware of the limitations of this analysis and the constraints on the downstream interpretation.

      To capture the functional diversity amongst metabolites, which leads to metabolites being present in multiple pathways of metabolite-pathways sets, we have implemented a new function to cluster metabolite-sets like pathways based on overlapping metabolites and visualize redundant metabolite-set (i.e. pathways) memberships (Fig.5f). For more details also see our response to Reviewer 1, Comment 12. We hope this will circumvent miss- and over-interpretation of the enrichment results.

      In addition, we have extended the text to include the analysis pitfalls explicitly (Line 416-419): "Another variable explaining the same amount of variance in PC1 is the tumour stage, which could point to adjacent normal tissue metabolic rewiring that happens in relation to stage and showcases that biological data harbour co-variations, which can not be disentangled by this method."

      Reviewer #3

      Evidence, reproducibility and clarity

      This manuscript introduces an R package MetaProViz for metabolomics data analysis (post anotation), aiming to solve a poor-analysis-choices problem and enable more people to do the analysis. MetaProViz not only guides people to select the best statistical method, but also enables to solve previously unsolved problems: e.g. multiple and variable metabolite names in different databases and their connections to prior knowledge. They also created exometabolomics analysis and the needed steps to visualise intra-cell / media processes. The authors demonstrated their new package via kidney cancer (clear-cell renal cell carcinoma dataset, steping one step closer to improve biological interpretability of omics data analysis.

      Significance

      This is a great tool and I can't wait to use it on many upcoming metabolomics projects! Authors tackle multiple ongoing issues within the field: from poor selection of statistical methods (they provide guidance or have default safer options) to the messiness of data annotation between databases and improving data interpretability. The field is still evolving quickly, and it's impossible to solve all problems with one package; thus some limitations within the package could be seen as a bit rigid. Nonetheless, this fully steps toward filling an existing methodological gap. All bioinformaticians doing metabolomic analysis, or those learning how to do it, will greatly benefit from this knowledge.

      I myself lead a team of 6 bioinformaticians, and we do analysis for researchers, clinicians, drug discovery, and various companies. We run internal metabolomics pipelines every day and fully sympathise with the problems addressed by the authors.

      Major comments affecting conclusions

      none.

      We thank the reviewer for this positive feedback on evidence, reproducibility and clarity as well as significance of our work given the reviewers experience with metabolomics data analysis mentioned. We appreciate that there are no major comments from the reviewer.

      Minor comments

      Minor comments, important issues that could be addressed and possibly improve the clarity or generally presentation of the tool. Please see all below.

      (1)

      1- You start with separating and talking about metabolomics and lipidomics, but lipidomics quickly dissapears (especially beyond abstract/intro) - no real need to discuss lipidomics.

      Thanks, that's a good note and we have removed it from the abstract and introduction.

      (2)

      2- You refer to the MetImp4 imputation web tool, but I cannot find an active website, manuscript, or R package for it, and the cited link does not load. This raises doubts about whether the tool is currently usable. Additionally, imputation choice should be guided by biological context and study design, not just by testing a few methods and selecting the one that performs best.

      We fully agree with the reviewer on imputation handling. The manuscript we cite from Wei et. al. (https://doi.org/10.1038/s41598-017-19120-0) compared a multitude of missing value imputation methods and made this comparison strategy available as a web-based tool not as any code-based package such as an R-package. Yet, the reviewer is right, the web-tool is no longer reachable. Hence, we have adapted the statement in our introduction (Line 61-62): "Moreover, there are tools that focus on specific steps of the pre-processing of feature intensities, which encompasses feature selection, missing value imputation (MVI)9 and data normalisation. For example, MetImp4 is a web-tool that includes and compares multiple MVI methods9. "

      (3)

      3- The authors address key metabolomics issues such as ambiguous metabolite names and isoforms, and their focus on resolving mapping ambiguities and translating between database identifiers is highly valuable. However, the larger challenge of de novo identification and the "dark matter" of unannotated metabolites remains unresolved (initiatives as MassIVE might help in the future https://massive.ucsd.edu/ProteoSAFe/ ), and readers may benefit from clearer acknowledgement that MetaProViz does not operate on raw spectral data. The introduction currently emphasizes annotation, but since MetaProViz requires already annotated metabolite tables (and then deals with all the messiness), this space might be better used to frame the interpretability and pathway-analysis challenges that the tool directly addresses.

      We appreciate the comment and have highlighted this in the abstract and introduction: "MetaProViz operates on annotated intensity values..." (Line 29 and 88).

      Given the newest advancements in metabolite identification using AI-based methods, MetaProViz toolkit with a focus on connecting metabolite IDs to prior knowledge becomes increasingly valuable. We added this to our discussion (Line 484-488): "Given the imminent shift in metabolite identification through AI-based approaches, including language model-guided48 methods and self-supervised learning49, the growing number of identified metabolites will make the MetaProViz toolkit increasingly valuable for the community to gain functional insights."

      In regards to the introduction, where we mention some tools for peak annotation: The reason why we have this paragraph where peak annotation are named is that we wanted to set the basis by (I) listing the different steps of metabolomics data analysis and (II) pointing to well-known tools of those steps. We also have a dedicated paragraph for pathway-analysis challenges.

      (4)

      4- I also really enjoyed you touching on the point of user-friendly but then inflexible and problem of reproducibility. We truly need well working packages for other bioinformaticians, rather than expecting wet-lab scientists to do all the analysis within the user interface.

      We thank the reviewer for this positive feedback.

      (5)

      5- It would be helpful to explain why the authors chose cancer/RCC samples for the demonstration. Was it because the dataset included both media and cell measurements? Does the tool perform best when multiple layers of information are available from the same experiment?

      We specifically chose the ccRCC cell line data as example since, for a multitude of cell lines, both media (exometabolomics) and intracellular metabolomics had been performed. The combination of both data types is only used in the biological regulated clustering (Fig. 5e-g), all other analyses do not require additional data modalities. We have not specifically tested how performance differs for this particular case as it would require multiple paired data (exometabolomics and intracellular metabolomics) taken at the same time and at different times.

      (6)

      6- Figure 2B: The upset plots effectively show increased overlap after adaptation, but it would be easier to compare changes if the order of the intersection bars in the "adapted" plot matched the original. For example, while total intersections increased (251→285), the PubChem+KEGG overlap decreased (24→5), likely due to reallocation to the full intersection.

      Thanks for raising this point. We initially had ordered the bars based on their intersection size, but we agree with the reviewers that for our point it makes sense to fix the order in the adapted plot to match the order of the original plot. We have done this (Fig 2a) and also extended the figure legend text of SFig. 2, which shows the individually performed adaptations summarized in Fig 2a.

      (7) (Planned)

      7- In your example of D-alanine and L-alanine - you mention how chirality is important biological feature, but up to this point it's not clear how do you do translation exactly and in which situations this would be treated just as "alanine" and when the more precise information would be retained? You mention RaMP-DB knowledge and one to X mappings as well as your general guidance in the "methods" part, but it would be useful to describe in this publication how you exactly tackled this problem in the ccRCC case.

      We thank the reviewer for this suggestion. Since this is a complex problem, we will add a more explicit description to the results section by showcasing more details on how we exactly tackled this problem in the ccRCC example data.

      In regards to D- and L-alanine, even though chirality is an important biological feature, in a standard experiment we can not distinguish if we detect the L- or D-aminoacid. This is why we try to assign all possible IDs to increase the overlap with the prior knowledge. In Fig. 2b we showcase that this can potentially lead to multiple mappings of the same measured feature to multiple pathways. For example, if we measure alanine and assign the pubchem ID for L-Alanine, D-Alanine and Alanine and try to map to metabolite-sets that include both L-Alanine and D-Alanine. In turn this could fall into Scenario 6 (Fig. 2e), where across pathways there is a D-Alanine specific one (Pathway 1) and a L-Alanine specific one (Pathway 2). Now we can decide, if we want to allow both mapping (many-to-one) or if we decide to exclude D-Alanine because we know our biological system is human and should primarily have L-Alanine.

      (8) (Planned)

      8- In one to many mappings, it would be interesting to see quantification how frequently it was happening within a pathway or across pathways. I.e. Would going into pathway analysis "solve" the issue of "lost in translation" or not really?

      We have quantified the frequency for the example of translating the KEGG metabolite-set into HMDB IDs (Fig. 2c, left panel). Yet, we are not showcasing the quantification across the KEGG metabolite-sets with this plot. During the revision we will add the full results available to the Extended Data Table 2, which currently only includes the results displayed in Fig.2c.

      (9)

      9- QC: the coefficient of variation (CV) helps identify features with high variability and thus low detection accuracy. Here it's important to acknowledge that if the feature is very variable between groups it can be extremely important, but if the feature is very variable within the group - only then one would have low trust in the accuracy.

      Yes, we totally agree with the reviewer on this. For this reason, we have applied CV only in instances where this is not leading to any condition-driven CV differences, but is truly feature-focused: (1) Function pool_estimation performs CV on the pool samples only, which are a homogeneous mixture of all samples, and hence can be used to assess feature variability. (2) Function processing performs CV on exometabolomics media samples (=blanks), which are also not impacted by different conditions.

      (10)

      10- Missing value imputation - while missing not at random is a great way to deal with missingness, it would be great to have options for others (not just MNAR), as missingness is of a complex nature. If a pretty strong decision has been made, it would be good to support this by some supplementary data (i.e. how results change while applying various combinations of missingness and why choosing MNAR seems to be the most robust).

      We have decided to only offer support for MNAR, since we would recommend MVI only if there is a biological basis for it.

      As mentioned in the response to your minor comment 2, Wei et. al. (https://doi.org/10.1038/s41598-017-19120-0) compared a multitude of missing value imputation methods. They compared six imputation methods (i.e., QRILC, Half-minimum, Zero, RF, kNN, SVD) for MNAR and systematically measured the performance of those imputation methods. They showed that QRILC and Half-Minimum produced much smaller SOR values, showing consistent good performances on data with different numbers of missing variables. This was the reason for us to only provide Half-minimum.

      (11) (Planned)

      11- In the pre-processing and imputation stages - it would be interesting to see a summary table of how many features are left after each stage.

      This is a good suggestion and refers to the steps described in Fig. 3a. We will create an overview table for this, add it into the Extended Data Table and refer to it in the results section.

      (12)

      12- Is there a reason not to do UMAP or PSL-DA graphs for outlier detection? Doing more than PCA would help to have more confidence in removing or retaining outliers in the cases where biological relevance is borderline.

      The reason we decided to use PCA was the standardly used combination with the Hotelling T2 outlier testing. Since PCA is a linear dimensionality reduction technique that preserves the overall variance in the data and has a clear mathematical foundation linked to the covariance structure, it specifically fits the required assumptions of the Hotelling T2 outlier testing. Indeed, Hotelling T2 relies on the properties of the covariance matrix and the assumption of a multivariate Gaussian distribution. UMAP is a non-linear dimensionality reduction technique, which prioritizes preserving local and global structures in a way that often results in good clustering visualization, but it distorts distances between clusters and does not have the same rigorous statistical underpinnings as PCA. In terms of PLS-DA, which focuses on maximizing the covariance between variables and the class labels, even though not commonly done, one could use the optimal latent variables for discrimination and apply Hotelling's T² to those latent variables. Yet, PLS-DA is supervised and actively tries to separate data points in the latent space, which can be misleading for outlier detection where methods like PCA that are unbiased, unsupervised and preserve global variance are advantageous.

      (13)

      13- Metadata vs metabolite features - can this be used beyond metabolomics (i.e. proteomics, transcriptomics, etc)? It can be always very useful when there are many metadata features and it's hard to pre-select beforehand which ones are the most biologically relevant.

      Yes, definitely. In fact, we have used the metadata analysis strategy also with proteomics data and it will work equally with any omics data type.

      (14)

      14- While authors discussed what KEGG pathways were significantly deregulated, it would be interesting to see all the pathways that were affected (e.g. aPEAR "bubble" graphs can show this (https://github.com/kerseviciute/aPEAR) , or something similar to NES scores). I appreciate the trickiness of it, but it would be quite interesting to see how authors e.g. Figure5e narrowed it down to the two pathways and how all the others looked like.

      We thank the reviewer for the suggestion of the aPEAR graphs. Following this suggestion, we have implemented a new function to enable clustering of the pathways based on overlapping metabolites (cluster_pk()). For more details regarding the method see also our response to Reviewer 1 (Comment 12) and our extended method section "Metabolite-set clustering" (Lines 656-671). We visualize the clustering results as a network graph, which we also included into Fig. 5f.

      The complete result of the KEGG enrichment can be found in Extended Data Table 1, Sheet 13 (Pathway enrichment analysis using KEGG on Young patient subset). The pathways are ranked by p.adjusted value and also include a score (FoldEnrichment) from the fishers exact test (similar to NES scores in GSEA). Here one can find a total of seven pathways with a p.adjusted value For Fig. 5e we narrowed down to these two pathways based on the previous findings of dysregulated dipeptides (Fig. 5d), as we searched for a potential explanation of this observation.

      (15)

      15- Could you comment on the runtime of the pipeline? In particular, do the additional translation steps and use of multiple databases substantially affect computational speed?

      Downloading and parsing databases takes significant time, especially large ones like RaMP or HMDB might take minutes on a standard laptop. Our local cache speeds up the process by eliminating the need for repeated downloads. In the future, database access will be even faster: according to our plans, all prior knowledge will be accessible in an already parsed format by our own API (omnipathdb.org). The ambiguity analysis, which is a complex data transformation pipeline, and plotting by ggplot2, another key component of MetaProViz, are the slowest parts, especially when performing analysis for the first time when no cache can be used. This means there are a few slow operations which complete in maximum a few dozens of seconds. However, the implementation and speed of these solutions doesn't fall behind what we commonly find in bioinformatics packages, and most importantly, the speed of MetaProViz doesn't pose an obstacle or difficulty regarding an efficient use of it in analysis pipelines.

      (16)

      16- I clap to the authors for automated checks if selected methods are appropriate!

      Thank you, this is something we think is important to ensure correct analysis and circumvent misinterpretation.

      (17)

      17- My suggestion would be to also look into power calculation or p-value histogram. In your example you saw some clear signal, but very frequently research studies are under-sampled and while effect can be clearly seen, there are just not enough samples to have statistically significant hits.

      We fully agree that power calculations are very important. Yet, this should ideally happen prior to the user's experiment. MetaProViz analysis starts at a later time-point and power calculations should have been done before. In regards to p-value histogram, we have implemented a similar measure, namely a density plot, which is plotted as a quality control measure within MetaProViz differential analysis function. The density plot is a smoothed version of a histogram that represents the distribution as a continuous probability density function and can be used to assess whether the p-values follow a uniform distribution.

      (18)

      18- Overall functional parts are novel and next step in helping with data interpretability, but I still found it hard to read into functionally clear insights (re to pathways / functional groupings of metabolites) - especially as you have e.g. enzyme-metabolite databases etc. I think clarity there could be improved and would help to get your message more widely across.

      Regarding the clarity to the pathway enrichment and their functional insights, we have extended the Figure legends of Fig. 4 and 5, clearly state that for the functional interpretation MetalinkDB is the prior knowledge resource we used to identify the links for methionine (Line 367-368), and we have extended our summary statement to highlight that we combine the biological clustering with prior knowledge for the mechanistic insight (Line 380-381).

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Schmidt et al. present MetaProViz, a comprehensive and modular platform for metabolomics data analysis. The tool provides a full suite of processing capabilities spanning metabolite annotation, quality control, normalization, differential analysis, integration of prior knowledge, functional enrichment, and visualization. The authors also include example datasets, primarily from renal cancer studies, to demonstrate the functionality of the pipeline. The MetaProViz framework addresses several long-standing challenges in metabolomics data analysis, particularly issues of reproducibility, ambiguous metabolite annotation, and the integration of metabolite features with pathway knowledge. The platform is likely to be a valuable addition for the community, but the reviewer has some comments that need to be addressed prior to publication.

      The section "Improving the connection between prior knowledge and metabolomics features" could benefit from additional clarification. It is not entirely clear to the reader what specific steps were taken beyond using RaMP-DB to translate metabolite identifiers. For example, how exactly were ambiguous mappings ("different scenarios") handled in practice, and to what extent does this process "fix" or merely flag inconsistencies? A more explicit description or example of how MetaProViz resolves these cases would help readers better understand the improvements claimed.

      The introduction of MetSigDB is intriguing, but its construction and added value are not sufficiently described. It would be helpful to clarify what specific advantages MetSigDB provides over directly using existing pathway resources such as KEGG, Reactome, or WikiPathways. For example, how many features, interactions, or metabolite-set relationships are included, and in what way are these pathways improved or extended compared to those already available in public databases?

      Figure 1D/1E: The reviewer appreciates the inclusion of the visualizations illustrating the different mapping scenarios, as these effectively convey the complexity of metabolite ID translation. However, it took some time to interpret what each scenario represented. It would be helpful to include brief annotations or explanatory text directly on the figures to clarify what each scenario depicts and how it relates to the underlying issue being addressed.

      "By assigning other potential metabolite IDs and by translating between the present ID types, we not only increase the number of features within all ID types but also increase the feature space with HMDB and KEGG IDs (Fig. 2a, right, SFig. 2 and Supplementary Table 1)". The reviewer would appreciate additional clarification on how this was done. It is not clear what specific steps or criteria were used to assign additional metabolite IDs or to translate between identifier types. The reviewer also appreciates the inclusion of the UpSet plots. However, simply having the plots side-by-side makes it difficult to determine the specific differences. An alternative visualization, such as stacked bar plots, scatter plots summarizing the changes in feature counts, or other representation that more clearly highlights the deltas, might make these results easier to interpret.

      MetaboAnalyst is mentioned several times in the manuscript. The reviewer is familiar with some of the limitations and practical challenges associated with using MetaboAnalyst and its R package. Given that MetaboAnalyst already offers some overlapping functionality with MetaProViz (and offers it in the form of an interactive website and a sometimes functional R package), a more explicit comparison between the two tools would help readers fully understand the unique advantages and improvements provided by MetaProViz.

      Page 11: The authors state that they used limma for statistical testing, including for the analysis of exometabolomics data, where the values appear to represent log2-transformed distances or ratios rather than normally distributed intensities. Since limma assumes approximately normal residuals, please provide evidence or justification that this assumption holds for these data types. If the distributions deviate substantially from normality, a non-parametric alternative might be more appropriate.

      Page 13: why were young and old defined this way? Authors should provide their reasoning and/or citations for this grouping.

      Figure 4e: It may help with interpretation to have these Sankey-like graph edges be proportional to the number of metabolites.

      Figure 4h: The values appear to be on an intensity scale (e.g., on the order of 3e10), yet some of them are negative, which would not be expected for raw or log-transformed mass spectrometry intensities. It is unclear whether these represent normalized abundance values, distances, or some other transformation. In addition, for the comparison of tumour versus normal tissue, it is not specified what statistical test was applied. Since mass spectrometry data are typically log2-transformed to approximate a log-normal distribution before performing t-tests or similar parametric methods, clarification is needed on how these data were processed.

      Figure 5: "Tukey's p.adj < 0.05" . Was this a Tukey's post-hoc test? This should be explicitly stated.

      The potential for multi-omics is mentioned. Please clarify how generalizable this framework is. Can it readily accommodate transcriptomics, proteomics, or fluxomics data, or does it require custom logic or formatting for each new data type?

      Please clarify if/how enrichment analyses account for varying set sizes and redundant metabolite memberships across pathways, which can bias over-representation analysis results.

      Significance

      The MetaProViz framework addresses several long-standing challenges in metabolomics data analysis, particularly issues of reproducibility, ambiguous metabolite annotation, and the integration of metabolite features with pathway knowledge. The platform is likely to be a valuable addition for the community, but the reviewer has some comments that need to be addressed prior to publication.

      Authors should be commended for the availability of data/code and detailed methods. Clarity is good. Authors have clearly spent a lot of time thinking about the challenges of metabolomics data analysis.

    1. byte code¶An intermediate language between source code and object code. Many modern languages first compile source code into byte code and then interpret the byte code with a program called a virtual machine.

      Bytecode is a low-level, intermediate representation of source code, optimized for execution by a virtual machine (VM) rather than direct hardware processing.

    1. The way we present ourselves to others around us (our behavior, social role, etc.) is called our public persona. We also may change how we behave and speak depending on the situation or who we are around, which is called code-switching.

      I like the point that code-switching and “putting on a persona” can still be authentic because different communities have different norms for what sincere expression looks like. Context collapse on social media seems like a platform-caused pressure toward a single “flattened” self; it would be interesting to discuss which design features (audience controls, friction for resharing, clearer context cues) could reduce that pressure without isolating people into echo chambers.

    1. The code is used to resolve only cases brought to the courts, which are usually decided by judges without a jury.

      Courts in civil law systems usually decide cases with a judge and no jury.

    1. Matillion provides a Low-Code/No-Code interface that allows data engineers to build these pipelines

      Think we do want data engineers for the actual data piece of AI enablement

    1. To use loop variables, we create a variable before our loop, and give it an initial value (often 0). Then within the loop over each item in our list, we can optionally add something to our loop variable. After the loop, our variable will have our final result.

      This example shows how a loop variable acts like a running total: it starts at 0, updates each time a condition is met, and stores the final count after the loop ends. The code is especially clear because it combines iteration (for letter in "Mississippi") with a conditional check (if letter == "i"), which is a common beginner pattern. You could make it even stronger by noting that this same structure works for counting anything in a list or string, not just letters.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary:

      The study provides a comprehensive overview of genome size variation in two related species of the genus Epidendrum, which appear to be homoploid, although their DNA content more closely corresponds to that of heteroploid species. While I have a few serious concerns regarding the data analysis, the study itself demonstrates a well-designed approach and offers a valuable comparison of different methods for genome size estimation. In particular, I would highlight the analysis of repetitive elements, which effectively explains the observed differences between the species. However, I encourage the authors to adopt a more critical perspective on the k-mer analysis and the potential pitfalls in data interpretation.

      Major comments:

      R1. p. 9: Genome size estimation via flow cytometry is an incorrect approach. The deviation is approximately 19% for E. anisatum and about 25% for E. marmoratum across three repeated measurements of the same tissue over three days? These values are far beyond the accepted standards of best practice for flow cytometry, which recommend a maximum deviation of 2-5% between repeated measurements of the same individual. Such variability indicates a systemic methodological issue or improper instrument calibration. Results with this level of inconsistency cannot be considered reliable estimates of genome size obtained by flow cytometry. If you provide the raw data, I can help identify the likely source of error, but as it stands, these results are not acceptable.

      __A: __Thanks a lot for pointing out this issue. We have identified the source of the wide interval after consulting with the staff of LabNalCit. We originally used human peripheral blood mononuclear cells (PBMCs) as a reference to estimate the genome size (GS) of P. sativum and used the resulting range to estimate the GS of Epidendrum. We calculated P. sativum's GS using a wide human GS range of 6-7 Gb, which resulted in a wide range of P. sativum GS and, consequently, in a wide range of GS for our samples. Therefore, the wide range reported is not an issue with the instruments, but about the specifics of the analysis.

      __We have done the following changes: __

      1. Reducing the range we calculated of P. sativum's GS using a narrower human genome size range (6.41-6.51; Piovesan et al. 2019; DOI: 10.1186/s13104-019-4137-z), and using these intervals to calculate our sample's GS.
      2. We have explained our procedure in the methods, changed our results as required, and included a supplementary table with cytometry data (Supplementary Data Table 1).
      3. Human peripheral blood mononuclear cells (PBMCs) from healthy individuals were used as a standard laboratory reference to calculate the P. sativum genome size. Pisum sativum and the Epidendrum samples were analyzed in a CytoFLEX S flow cytometer (Beckman-Coulter), individually and in combination with the internal references (PBMCs and P. sativum, respectively). Cytometry data analysis was performed using FlowJo® v. 10 (https://www.flowjo.com/). A genome size value for the Epidendrum samples was calculated as the average of the minimum and maximum 1C/2C values obtained from three replicates of the DNA content histograms of each tissue sample. Minimum and maximum values come from the interval of P. sativum estimations based on the human genome size range (human genome size range: 6.41-6.51; Piovesan et al. 2019).
      4. The 1C value in gigabases (Gb; calculated from mass in pg) of E. anisatum ranged from 2.55 to 2.62 Gb (mean 1C value = 2.59 Gb) and that of E. marmoratum from 1.11 to 1.18 Gb (mean 1C value = 1.13 Gb; Supplementary Data Table S1).
      5. We also eliminated from Figure 3 the range we had estimated previously.
      6. Finally, we changed the focus of the comparison and discussion of the evaluation of the bioinformatic estimations, highlighting this deviation rather than whether the GS bioinformatic estimations fall within the cytometric interval. We calculated the Mean Absolute Deviation (MAD) as the absolute difference between the genome size estimates using k-mers and flow cytometry. This meant changing the results in P. 11 and 12 and adding to Fig. 3 two boxplots depicting the MAD. We have also added Supplementary Data Fig. S3 depicting the absolute deviations for E. anisatum and E. marmoratum per tool using the estimates generated from a k-mer counting with a maximum k-mer coverage value of 10,000 using 16 different values of k; a Supplementary Data Figure S5 depicting the mean absolute deviations resulting from the different subsampled simulated depths of coverage of 5×, 10×, 20×, 30×, and 40×; and finally a Supplementary Data Fig. S6 depicting the MAD changes as a function of depth of coverage for E. anisatum and E. marmoratum.

      R1. p. 14 and some parts of Introduction: It may seem unusual, to say the least, to question genome size estimation in orchids using flow cytometry, given that this group is well known for extensive endoreplication. However, what effect does this phenomenon have on genome size analyses based on k-mers, or on the correct interpretation of peaks in k-mer histograms? How can such analyses be reliably interpreted when most nuclei used for DNA extraction and sequencing likely originate from endoreplicated cells? I would have expected a more detailed discussion of this issue in light of your results, particularly regarding the substantial variation in genome size estimates across different k-mer analysis settings. Could endoreplication be a contributing factor?

      A:

      We reworded the introduction p.3, 2nd paragraph to make our point on the effect of endoreplication on flow cytometry clearer. We eliminated the following sentence from discussion p. 15 : "Difficulties for cytometric estimation of genome size can thus be taxon-specific. Therefore, cross-validating flow cytometry and bioinformatics results can be the most effective method for estimating plant genome size, especially when only tissues suspected to show significant endoreplication, such as leaves, are available" We added the following, p. 18: Genome size estimation for non-model species is considered a highly standardized approach. However, tissue availability and intrinsic genome characteristics (large genomes, polyploidy, endoreplication, and the proportion of repetitive DNA) can still preclude genome size estimation (e.g. Kim et al. 2025) using cytometry and bioinformatic tools. Cross-validating flow cytometry and bioinformatics results might be particularly useful in those cases. For example, when only tissues suspected of showing significant conventional endoreplication, such as leaves, are available, bioinformatic tools can help to confirm that the first peak in cytometry histograms corresponds to 2C. Conversely, bioinformatic methods can be hindered by partial endoreplication, which only flow cytometry can detect.

           4. We included a paragraph discussing the effect of CE and PE on bioinformatic GS estimation P. 17:
      

      Besides ploidy level, heterozygosity, and the proportion of repetitive DNA, k-mer distribution can be modified by endoreplication. Since endoreplication of the whole genome (CE) produces genome copies (as in preparation for cell division, but nuclear and cell division do not occur ), we do not expect an effect on genome size estimates based on k-mer analyses. In contrast, PE alters coverage of a significant proportion of the genome, affecting k-mer distributions and genome size estimates (Piet et al., 2022). Species with PE might be challenging for k-mer-based methods of genome size estimation.

      R1. You repeatedly refer to the experiment on genome size estimation using analyses with maximum k-mer coverage of 10,000 and 2 million, under different k values. However, I would like to see a comparison - such as a correlation analysis - that supports this experiment. The results and discussion sections refer to it extensively, yet no corresponding figure or analysis is presented.

      A:

      We had previously included the results of the analyses using different k-mer coverage in the Supplementary Data Figure S2. We have added, to formally compare the results using analyses with maximum k-mer coverage of 10,000 and 2 million, a Wilcoxon paired signed-rank test, which showed a significant difference, p. 12: The estimated genome sizes using a maximum count value of 10,000 were generally lower for all tools in both species compared to using a maximum count value of 2 million (median of 2M experiment genome size - median of 10K experiment genome size= 0.24 Gb). The estimated genome size of the 2 million experiment also tended to be closer to the flow cytometry genome size estimation with significantly lower MAD than the 10K experiment (Wilcoxon paired signed-rank test p = 0.0009). In the 10K experiment (Supplementary Data Figure S2; S3), the tool with the lowest MAD for E. anisatum was findGSE-het (0.546 Gb) and for E. marmoratum it was findGSE-hom (0.116 Gb).

       2. We have added a boxplot in the Supplementary Data Figure S3 depicting the mean absolute deviations using maximum k-mer coverage of 10,000 and 2 million compared to flow cytometry.
      

      Minor comments:

      R1. p. 3: You stated: "Flow cytometry is the gold standard for genome size estimation, but whole-genome endoreplication (also known as conventional endoreplication; CE) and strict partial endoreplication (SPE) can confound this method." How did you mean this? Endopolyploidy is quite common in plants and flow cytometry is an excellent tool how to detect it and how to select the proper nuclei fraction for genome size estimation (if you are aware of possible misinterpretation caused by using inappropriate tissue for analysis). The same can be applied for partial endoreplication in orchids (see e.g. Travnicek et al 2015). Moreover, the term "strict partial endoreplication" is outdated and is only used by Brown et al. In more recent studies, the term partial endoreplication is used (e.g. Chumova et al. 2021- 10.1111/tpj.15306 or Piet et al. 2022 - 10.1016/j.xplc.2022.100330).

      A:

      We have reworded the paragraph where we stated "Flow cytometry is the gold standard for genome size estimation", as in the answer to Major comment 2. Additionally, we highlighted in the discussion how, while FC is the gold standard for GS estimation, studying multiple alternatives to it may be important for cases in which live tissue is not available or is available only to a limited extent (i.e. only certain tissues), p. 18 We have changed the term "strict partial endoreplication" to partial endoreplication (PE).

      R1. p. 5: "...both because of its outstanding taxic diversity..." There is no such thing as "taxic" diversity - perhaps you mean taxonomic diversity or species richness.

      __A: __We have changed "taxic diversity" to "species diversity".

      R1. p. 6: In description of flow cytometry you stated: "Young leaves of Pisum sativum (4.45

      pg/1C; Doležel et al. 1998) and peripheral blood mononuclear cells (PBMCs) from healthy

      individuals...". What does that mean? Did you really use blood cells? For what purpose?

      A: Please find the explanation and the modifications we've made in the answer to major comment 1.

      R1. p. 7: What do you mean by this statement "...reference of low-copy nuclear genes for each species..."? As far as I know, the Granados-Mendoza study used the Angiosperm v.1 probe set, so did you use that set of probes as reference?

      __A: __We rewrote: "To estimate the allele frequencies, the filtered sequences were mapped to a

      reference of low-copy nuclear genes for each species" to:

      To estimate the allele frequencies, the filtered sequences were mapped to the Angiosperm v.1 low-copy nuclear gene set of each species.

      R1. p. 7: Chromosome counts - there is a paragraph of methodology used for chromosome counting, but no results of this important part of the study.

      A: We are including a supplementary figure (Supplementary Data Figure 7) with micrographs of the chromosomes of E. anisatum and E. marmoratum.

      R1. p. 12: Depth of coverage used in repeatome analysis - why did you use different coverage for both species? Any explanation is needed.

      A: To make explicit the fact that the depth of coverage is determined automatically by the analysis with no consideration for the amount of input reads, but only of the graph density and the amount of RAM available (Box 3 in Novak et al. 2020), we rewrote:

      "To estimate the proportion of repetitive DNA, the individual protocol analyzed reads corresponding to depths of coverage of 0.06× for Epidendrum anisatum and 0.43× for E. marmoratum." to

      To estimate the proportion of repetitive DNA, the RepeatExplorer2 individual protocol determined a max number of analyzed reads (Nmax) corresponding to depths of coverage of 0.06x for Epidendrum anisatum and 0.43x for E. marmoratum.

      R1. p. 16: The variation in genome size of orchids is even higher, as the highest known DNA amount has been estimated in Liparis purpureoviridis - 56.11 pg (Travnicek et al 2019 - doi: 10.1111/nph.15996)

      A: We have updated it.

      R1. Fig. 1 - Where is the standard peak on Fig. 1? You mention it explicitly on page 9 where you are talking about FCM histograms.

      A: We reworded the results, eliminating the references to the standard internal reference.

      Reviewer #1 (Significance (Required)):

      Significance

      This study provides a valuable contribution to understanding genome size variation in two Epidendrum species by combining flow cytometry, k-mer analysis, and repetitive element characterization. Its strength lies in the integrative approach and in demonstrating how repetitive elements can explain interspecific differences in DNA content. The work is among the first to directly compare flow cytometric and k-mer-based genome size estimates in orchids, extending current knowledge of genome evolution in this complex plant group. However, the study would benefit from a more critical discussion of the limitations and interpretative pitfalls of k-mer analysis and from addressing methodological inconsistencies in the cytometric data. The research will interest a specialized audience in plant genomics, cytogenetics, and genome evolution, particularly those studying non-model or highly endoreplicated species.

      Field of expertise: plant cytogenetics, genome size evolution, orchid genomics.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary:

      With this work, the authors provide genome profiling information on the Epidendrum genus. They performed low-coverage short read sequencing and analysis, as well as flow cytometry approaches to estimate genome size, and perform comparative analysis for these methods. They also used the WGS dataset to test different approaches and models for genome profiling, as well as repeat abundance estimation, empathising the importance of genome profiling to provide basic and comparative genomic information in our non-model study species. Results show that the two "closely-related" Epidendrum species analysed (E. marmoratum and E. anisatum) have different genome profiles, exhibiting a 2.3-fold genome size difference, mostly triggered by the expansion of repetitive elements in E. marmoratum, specially of Ty3-Gypsy LTR-retrotransposon and a 172 tandem repeat (satellite DNA).

      Major comments:

      Overall, the manuscript is well-written, the aim, results and methods are explained properly, and although I missed some information in the introduction, the paper structure is overall good, and it doesn't lack any important information. The quality of the analysis is also adequate and no further big experiments or analysis would be needed.

      However, from my point of view, two main issues would need to be addressed:

      __R2. __The methods section is properly detailed and well explained. However, the project data and scripts are not available at the figshare link provided, and the BioProject code provided is not found at SRA. This needs to be solved as soon as possible, as if they're not available for review reproducibility of the manuscript cannot be fully assessed.

      __A: __We have made public the .histo files for all depths of coverage and cluster table files necessary to reproduce the results. We will also make public a fraction of the sequencing sufficient to reproduce our genome size and repetitive DNA results as soon as the manuscript is formally published. Whole dataset availability will be pending on the publication of the whole genome draft.

      R2. The authors specify in the methods that 0.06x and 0.43x sequencing depths were used as inputs for the RE analysis of E. anisatum and E. marmoratum. I understand these are differences based on the data availability and genome size differences. However, they don't correspond to either of the recommendations from Novak et al (2020):

      In the context of individual analysis: "The number of analyzed reads should correspond to 0.1-0.5× genome coverage. In the case of repeat-poor species, coverage can be increased up to 1.0-1.5×." Therefore, using 0.06x for E. anisatum should be justified, or at least addressed in the discussion.

      Moreover, using such difference in coverage might affect any comparisons made using these results. Given that the amount of reads is not limiting in this case, why such specific coverages have been used should be discussed in detail.

      In the context of comparative analysis: "Because different genomes are being analyzed simultaneously, the user must decide how they will be represented in the analyzed reads, choosing one of the following options. First, the number of reads analyzed from each genome will be adjusted to represent the same genome coverage. This option provides the same sensitivity of repeat detection for all analyzed samples and is therefore generally recommended; however, it requires that genome sizes of all analyzed species are known and that they do not substantially differ. In the case of large differences in genome sizes, too few reads may be analyzed from smaller genomes, especially if many species are analyzed simultaneously. A second option is to analyze the same number of reads from all samples, which will provide different depth of analysis in species differing in their genome sizes, and this fact should be considered when interpreting analysis results. Because each of these analysis setups has its advantages and drawbacks, it is a good idea to run both and cross-check their results."

      Therefore, it should be confirmed how much it was used for this approach (as in the methods it is only specified how much it was used for the individual analysis), and why.

      __A: __In Box 3, Novak et al (2020) explain that the number of analyzed reads (Nmax) is determined automatically by RepeatExplorer2, based on the graph density and available RAM. Therefore, the reported depths of coverage are results, not the input of the analysis. We tried different amounts of reads as input and got consistently similar results, so we kept the analysis using the whole dataset.

      For the comparative analysis, we have added the resulting depth of coverage and explained that we used the same number of reads for both species.

      Added to methods:

      "For the comparative protocol, we used the same amount of reads for both species".

      Added to results:

      "To estimate the proportion of repetitive DNA, the RepeatExplorer2 individual protocol determined a maximum number of analyzed reads (Nmax) corresponding to depths of coverage of 0.06x for E. anisatum and 0.43x for E. marmoratum. "

      "The RepeatExplorer2 comparative protocol determined a maximum number of analyzed reads (Nmax) corresponding to depths of coverage of approximately 0.14x for E. marmoratum and 0.06x for E. anisatum"

      This is consistent with other works which utilize RepeatExplorer2, for example, Chumová et al (2021; https://doi.org/10.1111/tpj.15306), who wrote: "The final repeatome analysis for each species was done using a maximum number of reads representing between 0.049x and 1.389x of genome coverage."

      Minor comments:

      General comments:

      • The concept of genome endoreplication and the problem it represents for C-value estimations needs to be better contextualised. It would be nice to have some background information in the introduction on how this is an issue (specially in Orchid species). Results shown are valuable and interesting but require a little more context on how frequent this is in plants, especially in Orchids, and across different tissues.

      __A: __We have included information about the variation of conventional and partial endoreplication in plants.

      Differences in CE may also occur between individuals or even respond to environmental factors (Barow 2006). In contrast, PE results in cells that replicate only a fraction (P) of the genome (Brown et al. 2017) and it has only been reported in Orchidaceae (Brown et al. 2017). CE and PE can occur in one or several endoreplication rounds, and different plant tissues may have different proportions of 2C, 4C, 8C ... nC or 2C, 4E, 8E, ... nE nuclear populations, respectively. The 2C nuclear population sometimes constitutes only a small fraction in differentiated somatic tissues and can be overlooked by cytometry (Trávníček et al. 2015). Using plant tissues with a high proportion of the 2C population (such as orchid ovaries and pollinaria) can help overcome this difficulty (Trávníček et al. 2015; Brown et al. 2017).

      Comments and suggestions on the figures:

      __R2. __In fig 1, the flow cytometry histograms need to be more self-explanatory. What are the Y axis "counts" of? Also, please either place the label for both rows or for each, but don't make it redundant. The axis fonts need to be made a bit larger too. If possible, explain briefly in the figure legend (and not only in the text) what each peak means.

      __A: __We have modified the figure adding legends for Y and X axes, eliminated redundant labels, and changed the font size.

      __R2. __Fig 5. Horizontal axis labels are illegible. Please make these larger (maybe make the plot wider by moving the plot legend to the top/bottom of the figure? - just a suggestion).

      __A: __We consider the horizontal axis label to be superfluous and we removed it.

      Small text editing suggestions:

      R2. Methods, "Ploidy level estimation and chromosome counts" section. It would be easier for the reader if this paragraph were either divided into two methods sections, or into two paragraphs at least, since these are two very different approaches and provide slightly different data or information.

      A: We slightly modified: "Chromosome number was counted from developing root tips" to

      "Additionally, to confirm ploidy level, chromosome number was counted from developing root tips" and changed the subtitle to only "Ploidy level estimation".

      R2. Methods, "Genome size estimation by k-mer analysis" section. Please specify whether the coverage simulations (of 5x to 40x) were made based on 1c or 2c of the genome size? I assumed haploid genome size but best to clarify.

      A: We have added it to P7: "To assess the suitability of the whole dataset and estimate the minimum coverage required for genome size estimation, the depth of coverage of both datasets was calculated based on the flow cytometry 1C genome size values."

      R2. Results, "Genome size estimation by k-mer analysis and ploidy estimation" section. In the first two paragraphs, the results presented appear to conform to anticipated patterns based on known properties of these types of datasets. Although this information confirms expected patterns, it does not provide new or biologically significant insights into the genomes analysed. It may be beneficial to further summarize these paragraphs so that the focus of this section can shift toward the comparison of methods and the biological interpretation of the genome profiles of Epidendrum.

      __A: __We agree that those paragraphs deviate a little from the focus of our results. However, we believe they provide useful information both for pattern confirmation in a relatively understudied field and for readers which may not be very familiar with the methods utilized.

      __R2. __Discussion, "Genome size estimation using flow cytometry" section. In the second paragraph, it is discussed how potential endoduplication events can "trick" the flow cytometry measurements. This has probably previously been discussed on other C-value calculation studies and would benefit from context from literature. How does this endoduplication really affect C-value measurements across plant taxa? I understand it is a well-known issue, so maybe add some references?

      A: We have included in the Introduction information about CE and PE and their associated references. P. 3 and 4.

      __R2. __Discussion, "Repetitive DNA composition in Epidendrum anisatum and E. marmoratum" section. In the second paragraph, when mentioning the relative abundance of Ty3-gypsy and Ty1-copia elements, it is also worth mentioning their differences in genomic distribution and the potential structural role of Ty3-gypsy elements.

      A: We added this paragraph in P.20:

      "Ty3-gypsy elements are frequently found in centromeric and pericentromeric regions, and may have an important structural role in heterochromatin (Jin et al. 2004; Neumann et al. 2011; Ma et al. 2023), particularly those with chromodomains in their structure (chromovirus, i.e. Tekay, CRM transposons; Neumann et al. 2011). Conversely, Ty1-copia elements tend to be more frequent in gene-rich regions (Wang et al. 2025A). However, Ty3-gypsy chromovirus elements can be found outside the heterochromatin regions (Neumann et al. 2011), and in Pennisetum purpureum (Poaceae) Ty1-copia elements are more common in pericentromeric regions (Yu et al. 2022)."

      R2. Discussion, "Repetitive DNA composition in Epidendrum anisatum and E. marmoratum" section. In the third paragraph, it is mentioned that both species have 2n=40. I believe these are results from this work since there is a methods section for chromosome counting. This data should therefore go into results.

      __A: __We have added the chromosome count micrographs as Supplementary Data Fig. S7

      R2. Discussion, "Repetitive DNA composition in Epidendrum anisatum and E. marmoratum" section. I'd recommend expanding a bit more on repetitive DNA differences based on the RepeatExplorer results. Providing references on whether this has been found in other taxa would be helpful too. For example, Ogre bursts have been previously described in other species (e.g. legumes, Wang et al., 2025). Moreover, I consider worth highlighting and discussing other interesting differences found, such as the differences in unknown repeats (could be due to one species having "older" elements- too degraded to give any database hits- compared to the other), or Class II TE differences between species (and how these account less for genome size difference because of their size), etc.

      A: We have rearranged and added discussion expanding on the role of repetitive DNA in E. anisatum and E. marmoratum and how it relates to the repetitive DNA in other species. This includes Ogre transposons, an expanded Ty1-copia vs. Ty3-gypsy discussion, and a section on unclassified repeats and can be found on P.19 to P.21.

      Reviewer #2 (Significance (Required)):

      Overall, this study provides a valuable contribution to our understanding of genome size diversity and repetitive DNA dynamics within Epidendrum, particularly through its combined use of low-coverage sequencing, flow cytometry, and comparative genome profiling. Its strongest aspects lie in the clear methodological framework and the integration of multiple complementary approaches, which together highlight substantial genome size divergence driven by repeat proliferation-an insight of clear relevance for orchid genomics and plant genome evolution more broadly.

      While the work would benefit from improved data availability, additional contextualization of the problem of endoreduplication in flow cytometry, and clarification of some figure elements and methodological details, the study nonetheless advances the field by presenting new comparative genomic information for two understudied species and by evaluating different strategies for genome profiling in non-model taxa.

      The primary audience will include researchers in non-model plant genomics, cytogenetics, and evolutionary biology, although the methodological comparisons may also be useful to a broader community working on genome characterization in diverse lineages. My expertise is in plant genomics, genome size evolution, and repetitive DNA biology; I am not a specialist in flow cytometry instrumentation or cytological methods, so my evaluation of those aspects is based on general familiarity rather than technical depth.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      A review on "Nuclear genome profiling of two Mexican orchids of the genus Epidendrum" by Alcalá-Gaxiola et al. submitted to ReviewCommons

      The present manuscript presented genomic data for two endemic Maxican orchids: Epidendrum anisatum and E. marmoratum. Authors aim to determine the genome size and ploidy using traditional (flow cytometry and chromosome counts) and genomic techniques (k-mer analysis, heterozygosity), along with the repetitive DNA composition characterization.

      Considering the genomic composition, the main difference observed in repeat composition between the two species was attributed to the presence of a 172 bp satDNA (AniS1) in E. anisatum, which represents about 11% of its genome but is virtually absent in E. marmoratum. The differences in the genomic proportion of AniS1 and Ty3-gypsy/Ogre lineage TEs between E. anisatum and E. marmoratum are suggested as potential drivers of the GS difference identified between the two species.

      Our main concern are about the GS estimation and chromosome number determination. Along with many issues related to GS estimations by flow cytometry, results related to chromosome number determination are missing on the manuscript. Improvements in both techiniques and results are crucial since authors aim to compare different methods to GS and ploidy determination.

      __R3. __Genome size: Following the abstract, it is no possible to understand that authors confirm the GS by flow cytometry - as clarified after on the manuscript. Please, since the approach used to obtain the results are crucial on this manuscript, make it clear on the abstract.

      A: We have highlighted the congruence of flow cytometry and bioinformatic approaches in the abstract:

      "Multiple depths of coverage, k values, and k-mer-based tools for genome size estimation were explored and contrasted with cytometry genome size estimations. Cytometry and k-mer analyses yielded a consistently higher genome size for E. anisatum (mean 1C genome size = 2.59 Gb) than * E. marmoratum* (mean 1C genome size = 1.13 Gb), which represents a 2.3-fold genome size difference."

      __R3.__Flow cytometry methodology: For a standard protocol, it is mandatory to use, at least, three individuals, each one analyzed on triplicate. Is is also important to check the variation among measurements obtained from the same individual and the values obtained from different individuals. Such variation should be bellow 3%. The result should be the avarege C-value following the standard deviation, what inform us the variation among individuals and measurements.

      __A: __We have done three technical replicates of each tissue of the individuals of E. anisatum and E. marmoratum. To show the variation from different replicates and tissues, we have included the Supplementary Data Table S1. Intraspecific variation on genome size is beyond the scope of this work.

      __R3. __Checking Fig. 1, we could not see the Pisum peack. If authors performed an analysis with external standart, it should be clarified on Methods. I suggest always use internal standard.

      Besides, comparing Fig. 1 for leave and pollinium, it seems to be necessary to set up the Flow Cytoemtry equipament. Note that the 2C peack change its position when comparing different graphs. The data could be placed more central on x-axis by setting the flow cytometry.

      Action Required: Considering that authors want to compare indirect genomic approaches to determine the GS, I suggest authors improve the GS determination by Flow Cytometry.

      Please, on Methodology section, keep both techniques focused on GS close one another. Follow the same order on Methodology, Results and Discussion sections.

      __A: __We have made several changes on the estimation and reporting of the flow cytometry genome size estimation. Among these:

      We have clarified the use of the P. sativum internal standard and PBMC's in methods (P.6). We have added the associated mean coefficient of variation for both the sample and the internal reference in Supplementary Data Table S1, in order to show that the variation is not the result of an instrument error. We have changed the order of the paragraphs in the methods section to follow the order in other sections.

      __R3. __Chromosome count: In Introduction section (page 5), the authors explicitly aim to provide "bioinformatics ploidy level estimation and chromosome counting." Furthermore, the Methods section (page 7, subsection "Ploidy level estimation and chromosome counts") details a specific protocol for chromosome counting involving root tip pretreatment, fixation, and staining. However, no results regarding chromosome counting are presented in the manuscript. There are no micrographs of metaphase plates, no tables with counts, and no mention of the actual counts in the Results section or Supplementary Material. Despite this absence of evidence, the Discussion (Page 18) states: "ploidy and chromosome counts of both E. anisatum and E. marmoratum are the same (2n=40)." The value of 2n=40 is presented as a finding of this study, however, there is no reference to this results.

      Action Required: The authors must resolve this discrepancy by either providing the missing empirical data (micrographs and counts). This detail needs to be reviewed with greater care and scientific integrity.

      __A: __We have added the chromosome count micrographs as Supplementary Data Fig. S7.

      Minor reviews (Suggestions):

      __R3. __Refining the Title (Optional): Although the current title is descriptive, we believe it undersells the value of the manuscript. Since this study provides the first genome profiling and repeatome characterization for the genus Epidendrum and offers important insights into the calibration of bioinformatics tools and flow cytometry for repetitive genomes, I suggest modifying the title to reflect these aspects. The comparative access of GS is also an importante feature. This would make the article more attractive to a broader audience interested in genomics of non-model organisms.

      __A: __We have changed the title to "Nuclear genome profiling of two species of Epidendrum (Orchidaceae): genome size, repeatome and ploidy"

      __R3. __Botanical Nomenclature (Optional): Although citing taxonomic authorities is not strictly required in all fields of plant sciences, most botanical journals expect the full author citation at the first mention of each species. Including this information would improve the nomenclatural rigor of the manuscript and align it with common practices in botanical publishing.

      A: We have added the citation of the taxonomic authorities:

      "This study aims to use two closely related endemic Mexican species, Epidendrum anisatum Lex and Epidendrum marmoratum A. Rich. & Galeotti, to provide the first genomic profiling for this genus..."

      __R3. __Abbreviation of Genus Names: I noticed inconsistencies in the abbreviation of scientific names throughout the manuscript. Standard scientific style dictates that the full genus name (Epidendrum) should be written out only at its first mention in the Abstract and again at the first mention in the main text. Thereafter, it should be abbreviated (e.g., E. anisatum, E. marmoratum), unless the name appears at the beginning of a sentence or if abbreviation would cause ambiguity with another genus. Please revise the text to apply this abbreviation consistently.

      A: We have made the changes requested as necessary.

      __R3. __Genome Size Notation: In the Abstract and throughout the text, genome size estimates are presented using the statistical symbol for the mean (x). While mathematically accurate, this notation is generic and does not immediately inform the reader about the biological nature of the DNA content (i.e., whether it refers to the gametic 1C or somatic 2C value). In plant cytometry literature, it is standard practice to explicitly label these values using C-value terminology to prevent ambiguity and eliminate the effect of the number of chromosome sets (Bennett & Leitch 2005; Greilhuber et al. 2005; Doležel et al. 2018). I strongly suggest replacing references to "x" with "1C" (e.g., changing "x = 2.58 Gb" to "mean 1C value = 2.58 Gb") to ensure immediate clarity and alignment with established conventions in the field.

      __A: __We have revised the text in every instance, for example, in the results section:

      "The 1C value in gigabases (Gb; calculated from mass in pg) of E. anisatum ranged from 2.55 to 2.62 Gb (mean 1C value = 2.59 Gb) and that of E. marmoratum from 1.11 to 1.18 Gb (mean 1C value = 1.13 Gb; Supplementary Data Table S1)."

      __R3. __Justification of the Sequencing Method: Although the sequencing strategy is clearly described, the manuscript would benefit from a bit more contextualization regarding the choice of low-pass genome skimming. In the Introduction, a short justification of why this approach is suitable for estimating genome size, heterozygosity, and repeat composition, particularly in plants with large, repeat-rich genomes, would help readers better understand the methodological rationale. Likewise, in the Methods section, briefly outlining why the selected sequencing depth is appropriate, and how it aligns with previous studies using similar coverage levels, would strengthen the clarity of the methodological framework. These additions would make the rationale behind the sequencing approach more transparent and accessible to readers who may be less familiar with low-coverage genomic strategies.

      __A: __We have added the following short sentence in P.7:

      "This sequencing method produces suitable data sets without systematic biases, allowing the estimation of genome size and the proportion of repetitive DNA. "

      __R3. __Wording Improvement Regarding RepeatExplorer2 Results: In the Results section, several sentences attribute biological outcomes to the RepeatExplorer2 "protocols" (e.g., "According to this protocol, both species have highly repetitive genomes..."; "The comparative protocol showed a 67% total repeat proportion, which falls between the estimated repeat proportions of the two species according to the results of the individual protocol"). Since the RepeatExplorer2 protocol itself only provides the analytical workflow and not species-specific results, this phrasing may be misleading.

      A: We have rephrased these sections to emphasize that these are "the results of" the protocols and not the protocols themselves.

      Reviewer #3 (Significance (Required)):

      Significance

      General assessment

      Strengths

      1.First Detailed Genomic Profile for the Genus Epidendrum: The study provides the first integrated dataset on genome size, ploidy, heterozygosity, and repeatome for species of the genus Epidendrum, a novel contribution for an extremely diverse and under-explored group in terms of cytogenomics.

      Cross-validation of in vitro and in silico analyses: Flow cytometry is considered the gold standard for genome size (GS) estimation because it physically measures DNA quantity (Doležel et al. 2007; Śliwińska 2018). However, it typically requires fresh tissue, which is not always available. Conversely, k-mer analysis is a rapid bioinformatics technique utilizing sequencing data that does not rely on a reference genome. Nevertheless, it is frequently viewed with skepticism or distrust due to discrepancies with laboratory GS estimates (Pflug et al. 2020; Hesse 2023). In this study, by comparing computational results with flow cytometry data, the authors were able to validate the reliability of computational estimates for the investigated species. Since the 'true' GS was already established via flow cytometry, the authors used this value as a benchmark to test various software tools (GenomeScope, findGSE, CovEst) and parameters. This approach allowed for the identification of which tools perform best for complex genomes. For instance, they found that tools failing to account for heterozygosity (such as findGSE-hom) drastically overestimated the genome size of E. anisatum, whereas GenomeScope and findGSE-het (which account for heterozygosity) yielded results closer to the flow cytometry values. Thus, they demonstrated that this cross-validation is an effective method for estimating plant genome sizes with greater precision. This integrative approach is essential not only for defining GS but also for demonstrating how bioinformatics methods must be calibrated (particularly regarding depth of coverage and maximum k-mer coverage) to provide accurate data for non-model organisms when flow cytometry is not feasible.

      Limitations

      1. Limited Taxonomic Sampling: The study analyzes only two species of Epidendrum, which restricts the ability to make broad inferences regarding genome evolution across the genus. Given the outstanding diversity of Epidendrum (>1,800 species), the current sampling is insufficient to propose generalized evolutionary patterns. As the authors state by the end of the Discussion (page 18) "Future work should investigate to what extent LTR transposons and satellite DNA have been responsible for shaping genome size variation in different lineages of Epidendrum, analyzing a greater portion of its taxic diversity in an evolutionary context.". 2.Lack of Cytogenetic Results and Mapping: One of the major finding of this study is the identification of the AniS1 satellite as a potential key driver of the genome size difference between the species, occupying ~11% of the E. anisatum genome and virtually absent in E. marmoratum. While the authors use bioinformatic metrics (C and P indices) to infer a dispersed organization in the Discussion (Page 18), the study lacks physical validation via Fluorescence in situ Hybridization (FISH) - and a basic validation of the chromosome number. Without cytogenetic mapping, it is impossible to confirm the actual chromosomal distribution of this massive repetitive array, for instance, whether it has accumulated in specific heterochromatic blocks (e.g., centromeric or subtelomeric regions) or if it is genuinely interspersed along the chromosome arms. I suggest acknowledging this as a limitation in the Discussion, as the physical organization of such abundant repeats has significant implications for understanding the structural evolution of the species' chromosomes.

      Advance

      To the best of our knowledge, this study represents the first comprehensive genome profiling and repeatome characterization for any species of the genus Epidendrum. By integrating flow cytometry, k-mer-based approaches, and low-pass sequencing, the authors provide the first insights into the genomic architecture of Epidendrum, including quantitative assessments of transposable elements, lineage-specific satellite DNA, and repeat-driven genome expansion. This constitutes both a technical and a conceptual advance: technically, the study demonstrates the feasibility and limitations of combining in vitro and in silico methods for genome characterization in large, repeat-rich plant genomes; conceptually, it offers new evolutionary perspectives on how repetitive elements shape genome size divergence within a highly diverse orchid lineage. These results broaden the genomic knowledge base for Neotropical orchids and establish a foundational reference for future comparative, cytogenomic, and phylogenomic studies within Epidendrum and related groups.

      Audience

      This study will primarily interest a broad audience, including researchers in plant genomics, evolutionary biology, cytogenomics, and bioinformatics, especially those working with non-model plants or groups with large, repetitive genomes. It also holds relevance for scientists engaged in genome size evolution, repetitive DNA biology, and comparative genomics. Other researchers are likely to use this work as a methodological reference for genome profiling in non-model taxa, especially regarding the integration of flow cytometry and k-mer-based estimations and the challenges posed by highly repetitive genomes. The detailed repeatome characterization, including identification of lineage-specific satellites and retrotransposon dynamics, will support comparative genomic analyses, repeat evolution studies, and future cytogenetic validation (e.g., FISH experiments). Additionally, this dataset establishes a genomic baseline that can inform phylogenomic studies, species delimitation, and evolutionary inference within Epidendrum and related orchid groups.

      Reviewer's Backgrounds

      The review was prepared by two reviewers. Our expertise lies in evolution and biological diversity, with a focus on cytogenomic and genome size evolution. Among the projects in development, the cytogenomics evolution of Neotropical orchids is one of the main studies (also focused on Epidendrum). These areas shape my perspective in evaluating the evolutionary, cytogenomic, and biological implications of the study. However, we have limited expertise in methodologies related to k-mer-based genome profiling and heterozygosity modeling. Therefore, our evaluation does not deeply assess the technical validity of these analytical pipelines.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      With this work, the authors provide genome profiling information on the Epidendrum genus. They performed low-coverage short read sequencing and analysis, as well as flow cytometry approaches to estimate genome size, and perform comparative analysis for these methods. They also used the WGS dataset to test different approaches and models for genome profiling, as well as repeat abundance estimation, empathising the importance of genome profiling to provide basic and comparative genomic information in our non-model study species. Results show that the two "closely-related" Epidendrum species analysed (E. marmoratum and E. anisatum) have different genome profiles, exhibiting a 2.3-fold genome size difference, mostly triggered by the expansion of repetitive elements in E. marmoratum, specially of Ty3-Gypsy LTR-retrotransposon and a 172 tandem repeat (satellite DNA).

      Major comments:

      Overall, the manuscript is well-written, the aim, results and methods are explained properly, and although I missed some information in the introduction, the paper structure is overall good, and it doesn't lack any important information. The quality of the analysis is also adequate and no further big experiments or analysis would be needed. However, from my point of view, two main issues would need to be addressed:

      • The methods section is properly detailed and well explained. However, the project data and scripts are not available at the figshare link provided, and the BioProject code provided is not found at SRA. This needs to be solved as soon as possible, as if they're not available for review reproducibility of the manuscript cannot be fully assessed.
      • The authors specify in the methods that 0.06x and 0.43x sequencing depths were used as inputs for the RE analysis of E. anisatum and E. marmoratum. I understand these are differences based on the data availability and genome size differences. However, they don't correspond to either of the recommendations from Novak et al (2020):

      In the context of individual analysis: "The number of analyzed reads should correspond to 0.1-0.5× genome coverage. In the case of repeat-poor species, coverage can be increased up to 1.0-1.5×." Therefore, using 0.06x for E. anisatum should be justified, or at least addressed in the discussion. Moreover, using such difference in coverage might affect any comparisons made using these results. Given that the amount of reads is not limiting in this case, why such specific coverages have been used should be discussed in detail.

      In the context of comparative analysis: "Because different genomes are being analyzed simultaneously, the user must decide how they will be represented in the analyzed reads, choosing one of the following options. First, the number of reads analyzed from each genome will be adjusted to represent the same genome coverage. This option provides the same sensitivity of repeat detection for all analyzed samples and is therefore generally recommended; however, it requires that genome sizes of all analyzed species are known and that they do not substantially differ. In the case of large differences in genome sizes, too few reads may be analyzed from smaller genomes, especially if many species are analyzed simultaneously. A second option is to analyze the same number of reads from all samples, which will provide different depth of analysis in species differing in their genome sizes, and this fact should be considered when interpreting analysis results. Because each of these analysis setups has its advantages and drawbacks, it is a good idea to run both and cross-check their results." Therefore, it should be confirmed how much it was used for this approach (as in the methods it is only specified how much it was used for the individual analysis), and why.

      Minor comments:

      General comments:

      • The concept of genome endoreplication and the problem it represents for C-value estimations needs to be better contextualised. It would be nice to have some background information in the introduction on how this is an issue (specially in Orchid species). Results shown are valuable and interesting but require a little more context on how frequent this is in plants, especially in Orchids, and across different tissues.

      Comments and suggestions on the figures:

      • In fig 1, the flow cytometry histograms need to be more self-explanatory. What are the Y axis "counts" of? Also, please either place the label for both rows or for each, but don't make it redundant. The axis fonts need to be made a bit larger too. If possible, explain briefly in the figure legend (and not only in the text) what each peak means.
      • Fig 5. Horizontal axis labels are illegible. Please make these larger (maybe make the plot wider by moving the plot legend to the top/bottom of the figure? - just a suggestion).

      Small text editing suggestions:

      • Methods, "Ploidy level estimation and chromosome counts" section. It would be easier for the reader if this paragraph was either divided into two methods sections, or into two paragraphs at least, since these are two very different approaches and provide slightly different data or information.
      • Methods, "Genome size estimation by k-mer analysis" section. Please specify whether the coverage simulations (of 5x to 40x) were made based on 1c or 2c of the genome size? I assumed haploid genome size but best to clarify.
      • Results, "Genome size estimation by k-mer analysis and ploidy estimation" section. In the first two paragraphs, the results presented appear to conform to anticipated patterns based on known properties of these types of datasets. Although this information confirms expected patterns, it does not provide new or biologically significant insights into the genomes analysed. It may be beneficial to further summarize these paragraphs so that the focus of this section can shift toward the comparison of methods and the biological interpretation of the genome profiles of Epidendrum.
      • Discussion, "Genome size estimation using flow cytometry" section. In the second paragraph, it is discussed how potential endoduplication events can "trick" the flow cytometry measurements. This has probably previously been discussed on other C-value calculation studies and would benefit from context from literature. How does this endoduplication really affect C-value measurements across plant taxa? I understand it is a well-known issue, so maybe add some references?
      • Discussion, "Repetitive DNA composition in Epidendrum anisatum and E. marmoratum" section. In the second paragraph, when mentioning the relative abundance of Ty3-gypsy and Ty1-copia elements, it is also worth mentioning their differences in genomic distribution and the potential structural role of Ty3-gypsy elements.
      • Discussion, "Repetitive DNA composition in Epidendrum anisatum and E. marmoratum" section. In the third paragraph, it is mentioned that both species have 2n=40. I believe these are results from this work since there is a methods section for chromosome counting. This data should therefore go into results.
      • Discussion, "Repetitive DNA composition in Epidendrum anisatum and E. marmoratum" section. I'd recommend expanding a bit more on repetitive DNA differences based on the RepeatExplorer results. Providing references on whether this has been found in other taxa would be helpful too. For example, Ogre bursts have been previously described in other species (e.g. legumes, Wang et al., 2025). Moreover, I consider worth highlighting and discussing other interesting differences found, such as the differences in unknown repeats (could be due to one species having "older" elements- too degraded to give any database hits- compared to the other), or Class II TE differences between species (and how these account less for genome size difference because of their size), etc.

      Significance

      Overall, this study provides a valuable contribution to our understanding of genome size diversity and repetitive DNA dynamics within Epidendrum, particularly through its combined use of low-coverage sequencing, flow cytometry, and comparative genome profiling. Its strongest aspects lie in the clear methodological framework and the integration of multiple complementary approaches, which together highlight substantial genome size divergence driven by repeat proliferation-an insight of clear relevance for orchid genomics and plant genome evolution more broadly.

      While the work would benefit from improved data availability, additional contextualization of the problem of endoreduplication in flow cytometry, and clarification of some figure elements and methodological details, the study nonetheless advances the field by presenting new comparative genomic information for two understudied species and by evaluating different strategies for genome profiling in non-model taxa.

      The primary audience will include researchers in non-model plant genomics, cytogenetics, and evolutionary biology, although the methodological comparisons may also be useful to a broader community working on genome characterization in diverse lineages. My expertise is in plant genomics, genome size evolution, and repetitive DNA biology; I am not a specialist in flow cytometry instrumentation or cytological methods, so my evaluation of those aspects is based on general familiarity rather than technical depth.

    1. # shade the term spread polygon(c(time(TB3MS), rev(time(TB3MS))), c(TB10YS, rev(TB3MS)), col = alpha("steelblue", alpha = 0.3), border = NA)

      This code works well: tt <- if (!is.null(attr(Spread, "index"))) index(Spread) else time(Spread) y10 <- as.numeric(TB10YS) y3 <- as.numeric(TB3MS)

      polygon( x = c(tt, rev(tt)), y = c(y10, rev(y3)), col = alpha("steelblue", 0.3), border = NA )

    1. QUESTIONS:

      What the hell is the answer to:

      1) Exercise 2: In the editor, three vectors are defined. Each one represents the box office numbers from the first three Star Wars movies. The first element of each vector indicates the US box office revenue, the second element refers to the Non-US box office. In this exercise, you’ll combine all these figures into a single vector with name ‘box_office’. Next construct a ‘matrix star_wars’ with 2 rows and 3 columns.

      2) Exercise 1: Write code to create an array ‘l’ with 3 sheets of 2 rows and 4 columns, filled with the first 5 letters of the alphabet.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1

      Chen et al. engineered and characterized a suite of next-generation GECIs for the Drosophila NMJ that allow for the visualization of calcium dynamics within the presynaptic compartment, at presynaptic active zones, and in the postsynaptic compartment. These GECIs include ratiometric presynaptic Scar8m (targeted to synaptic vesicles), ratiometric active zone localized Bar8f (targeted to the scaffold molecule BRP), and postsynaptic SynapGCaMP8m. The authors demonstrate that these new indicators are a large improvement on the widely used GCaMP6 and GCaMP7 series GECIs, with increased speed and sensitivity. They show that presynaptic Scar8m accurately captures presynaptic calcium dynamics with superior sensitivity to the GCaMP6 and GCaMP7 series and with similar kinetics to chemical dyes. The active-zone targeted Bar8f sensor was assessed for the ability to detect release-site-specific nanodomain changes, but the authors concluded that this sensor is still too slow to accurately do so. Lastly, the use of postsynaptic SynapGCaMP8m was shown to enable the detection of quantal events with similar resolution to electrophysiological recordings. Finally, the authors developed a Python-based analysis software, CaFire, that enables automated quantification of evoked and spontaneous calcium signals. These tools will greatly expand our ability to detect activity at individual synapses without the need for chemical dyes or electrophysiology.

      We thank this Reviewer for the overall positive assessment of our manuscript and for the incisive comments.

      (1) The role of Excel in the pipeline could be more clearly explained. Lines 182-187 could be better worded to indicate that CaFire provides analysis downstream of intensity detection in ImageJ. Moreover, the data type of the exported data, such as .csv or .xlsx, should be indicated instead of 'export to graphical program such as Microsoft Excel'.

      We thank the Reviewer for these comments, many of which were shared by the other reviewers. In response, we have now 1) more clearly explained the role of Excel in the CaFire pipeline (lines 677-681), 2) revised the wording in lines 676-679 to indicate that CaFire provides analysis downsteam of intensity detection in ImageJ, and 3) Clarified the exported data type to Excel (lines 677-681). These efforts have improved the clarity and readability of the CaFire analysis pipeline.

      (2) In Figure 2A, the 'Excel' step should either be deleted or included as 'data validation' as ImageJ exports don't require MS Excel or any specific software to be analysed. (Also, the graphic used to depict Excel software in Figure 2A is confusing.)

      We thank the reviewer for this helpful suggestion. In the Fig. 2A, we have changed the Excel portion and clarified the processing steps in the revised methods. Specifically, we now indicate that ROIs are first selected in Fiji/ImageJ and analyzed to obtain time-series data containing both the time information and the corresponding imaging mean intensity values. These data are then exported to a spreadsheet file (e.g., Excel), which is used to organize the output before being imported into CaFire for subsequent analysis. These changes can be found in the Fig. 2A and methods (lines 676-681).

      (3) Figure 2B should include the 'Partition Specification' window (as shown on the GitHub) as well as the threshold selection to give the readers a better understanding of how the tool works.

      We absolutely agree with this comment, and have made the suggested changes to the Fig. 2B. In particular, we have replaced the software interface panels and now include windows illustrating the Load File, Peak Detection, and Partition functions. These updated screenshots provide a clearer view of how CaFire is used to load the data, detect events, and perform partition specification for subsequent analysis. We agree these changes will give the readers a better understanding of how the tool works, and we thank the reviewer for this comment.

      (4) The presentation of data is well organized throughout the paper. However, in Figure 6C, it is unclear how the heatmaps represent the spatiotemporal fluorescence dynamics of each indicator. Does the signal correspond to a line drawn across the ROI shown in Figure 6B? If so, this should be indicated.

      We apologize that the heatmaps were unclear in Fig panel 6C (Fig. 7C in the Current revision). Each heatmap is derived from a one-pixel-wide vertical line within a miniature-event ROI. These heatmaps correspond to the fluorescence change in the indicated SynapGCaMP variant of individual quantal events and their traces shown in Fig. 7C, with a representative image of the baseline and peak fluorescence shown in Fig. 7B. Specifically, we have added the following to the revised Fig. 7C legend:

      The corresponding heatmaps below were generated from a single vertical line extracted from a representative miniature-event ROI, and visualize the spatiotemporal fluorescence dynamics (ΔF/F) along that line over time.

      (5) In Figure 6D, the addition of non-matched electrophysiology recordings is confusing. Maybe add "at different time points" to the end of the 6D legend, or consider removing the electrophysiology trace from Figure 6D and referring the reader to the traces in Figure 7A for comparison (considering the same point is made more rigorously in Figure 7).

      This is a good point, one shared with another reviewer. We apologize this was not clear, and have now revised this part of the figure to remove the electrophysiological traces in what is now Fig. 7 while keeping the paired ones still in what is now Fig. 8A as suggested by the reviewer. We agree this helps to clarify the quantal calcium transients.

      (6) In GitHub, an example ImageJ Script for analyzing the images and creating the inputs for CaFire would be helpful to ensure formatting compatibility, especially given potential variability when exporting intensity information for two channels. In the Usage Guide, more information would be helpful, such as how to select ∆R/R, ideally with screenshots of the application being used to analyze example data for both single-channel and two-channel images.

      We agree that additional details added to the GitHub would be helpful for users of CaFire. In response, we have now added the following improvements to the GitHub site: 

      - ImageJ operation screenshots

      Step-by-step illustrations of ROI drawing and Multi Measure extraction.

      - Example Excel file with time and intensity values

      Demonstrates the required data format for CaFire import, including proper headers.

      - CaFire loading screenshots for single-channel and dual-channel imaging

      Shows how to import GCaMP into Channel 1 and mScarlet into Channel 2.

      - Peak Detection and Partition setting screenshots

      Visual examples of automatic peak detection, manual correction, and trace partitioning.

      - Instructions for ROI Extraction and CaFire Analysis

      A written guide describing the full workflow from ROI selection to CaFire data export.

      These changes have improved the usability and accessibility of CaFire, and we thank the reviewer for these points.

      Reviewer #2

      Calcium ions play a key role in synaptic transmission and plasticity. To improve calcium measurements at synaptic terminals, previous studies have targeted genetically encoded calcium indicators (GECIs) to pre- and postsynaptic locations. Here, Chen et al. improve these constructs by incorporating the latest GCaMP8 sensors and a stable red fluorescent protein to enable ratiometric measurements. In addition, they develop a new analysis platform, 'CaFire', to facilitate automated quantification. Using these tools, the authors demonstrate favorable properties of their sensors relative to earlier constructs. Impressively, by positioning postsynaptic GCaMP8m near glutamate receptors, they show that their sensors can report miniature synaptic events with speed and sensitivity approaching that of intracellular electrophysiological recordings. These new sensors and the analysis platform provide a valuable tool for resolving synaptic events using all-optical methods.

      We thank the Reviewer for their overall positive evaluation and comments.

      Major comments:

      (1) While the authors rigorously compared the response amplitude, rise, and decay kinetics of several sensors, key parameters like brightness and photobleaching rates are not reported. I feel that including this information is important as synaptically tethered sensors, compared to freely diffusible cytosolic indicators, can be especially prone to photobleaching, particularly under the high-intensity illumination and high-magnification conditions required for synaptic imaging. Quantifying baseline brightness and photobleaching rates would add valuable information for researchers intending to adopt these tools, especially in the context of prolonged or high-speed imaging experiments.

      This is a good point made by the reviewer, and one we agree will be useful for researchers to be aware. First, it is important to note that the photobleaching and brightness of the sensors will vary depending on the nature of the user’s imaging equipment, which can vary significantly between widefield microscopes (with various LED or halogen light sources for illumination), laser scanning systems (e.g., line scans with confocal systems), or area scanning systems using resonant scanners (as we use in our current study). Under the same imaging settings, GCaMP8f and 8m exhibit comparable baseline fluorescence, whereas GCaMP6f and 6s are noticeably dimmer; because our aim is to assess each reagent’s potential under optimal conditions, we routinely adjust excitation/camera parameters before acquisition to place baseline fluorescence in an appropriate dynamic range. As an important addition to this study, motivated by the reviewer’s comments above, we now directly compare neuronal cytosolic GCaMP8m expression with our Scar8m sensor, showing higher sensitivity with Scar8m (now shown in the new Fig. 3F-H).

      Regarding photobleaching, GCaMP signals are generally stable, while mScarlet is more prone to bleaching: in presynaptic area scanned confocal recordings, the mScarlet channel drops by ~15% over 15 secs, whereas GCaMP6s/8f/8m show no obvious bleaching over the same window (lines 549-553). In contrast, presynaptic widefield imaging using an LED system (CCD), GCaMP8f shows ~8% loss over 15 secs (lines 610-611). Similarly, for postsynaptic SynapGCaMP6f/8f/8m, confocal resonant area scans show no obvious bleaching over 60 secs, while widefield shows ~2–5% bleaching over 60 secs (lines 634-638). Finally, in active-zone/BRP calcium imaging (confocal), mScarlet again bleaches by ~15% over 15 s, while GCaMP8f/8m show no obvious bleaching. The mScarlet-channel bleaching can be corrected in Huygens SVI (Bleaching correction or via the Deconvolution Wizard), whereas we avoid applying bleaching correction to the green GCaMP channel when no clear decay is present to prevent introducing artifacts. This information is now added to the methods (lines 548-553).

      (2) In several places, the authors compare the performance of their sensors with synthetic calcium dyes, but these comparisons are based on literature values rather than on side-by-side measurements in the same preparation. Given differences in imaging conditions across studies (e.g., illumination, camera sensitivity, and noise), parameters like indicator brightness, SNR, and photobleaching are difficult to compare meaningfully. Additionally, the limited frame rate used in the present study may preclude accurate assessment of rise times relative to fast chemical dyes. These issues weaken the claim made in the abstract that "...a ratiometric presynaptic GCaMP8m sensor accurately captures .. Ca²⁺ changes with superior sensitivity and similar kinetics compared to chemical dyes." The authors should clearly acknowledge these limitations and soften their conclusions. A direct comparison in the same system, if feasible, would greatly strengthen the manuscript.

      We absolutely agree with these points made the reviewer, and have made a concerted effort to address them through the following:

      We have now directly compared presynaptic calcium responses on the same imaging system using the chemical dye Oregon Green Bapta-1 (OGB-1), one of the primary synthetic calcium indicators used in our field. These experiments reveal that Scar8f exhibits markedly faster kinetics and an improved signal-to-noise ratio compared to OGB-1, with higher peak fluorescence responses (Scar8f: 0.32, OGB-1: 0.23). The rise time constants of the two indicators are comparable (both ~3 msecs), whereas the decay of Scar8f is faster than that of OGB-1 (Scar8f: ~40, OGB-1: ~60), indicating more rapid signal recovery. These results now directly demonstrate the superiority of the new GCaMP8 sensors we have engineered over conventional synthetic dyes, and are now presented in the new Fig. 3A-E of the manuscript.

      We agree with the reviewer that, in the original submission, the relatively slow resonant area scans (~115 fps) limited the temporal resolution of our rise time measurements. To address this, we have re-measured the rise time using higher frame-rate line scans (kHz). For Scar8f, the rise time constant was 6.736 msec at ~115 fps resonant area scanned, but shortened to 2.893 msec when imaged at ~303 fps, indicating that the original protocol underestimated the true kinetics. In addition, for Bar8m, area scans at ~118 fps yielded a rise time constant of 9.019 msec, whereas line scans at ~1085 fps reduced the rise time constant to 3.230 msec. These new measurements are now incorporated into the manuscript ( Figs. 3,4, and 6) to more accurately reflect the fast kinetics of these indicators.

      (3) The authors state that their indicators can now achieve measurements previously attainable with chemical dyes and electrophysiology. I encourage the authors to also consider how their tools might enable new measurements beyond what these traditional techniques allow. For example, while electrophysiology can detect summed mEPSPs across synapses, imaging could go a step further by spatially resolving the synaptic origin of individual mEPSP events. One could, for instance, image MN-Ib and MN-Is simultaneously without silencing either input, and detect mEPSP events specific to each synapse. This would enable synapse-specific mapping of quantal events - something electrophysiology alone cannot provide. Demonstrating even a proof-of-principle along these lines could highlight the unique advantages of the new tools by showing that they not only match previous methods but also enable new types of measurements.

      These are excellent points raised by the reviewer. In response, we have done the following: 

      We have now included a supplemental video as “proof-of-principle” data showing simultaneous imaging of SynapGCaMP8m quantal events at both MN-Is and -Ib, demonstrating that synapse-specific spatial mapping of quantal events can be obtained with this tool (see new Supplemental Video 1). 

      We have also included an additional discussion of the potential and limitations of these tools for new measurements beyond conventional approaches. This discussion is now presented in lines 419-421 in the manuscript.

      (4) For ratiometric measurements, it is important to estimate and subtract background signals in each channel. Without this correction, the computed ratio may be skewed, as background adds an offset to both channels and can distort the ratio. However, it is not clear from the Methods section whether, or how, background fluorescence was measured and subtracted.

      This is a good point, and we agree more clarification about how ratiometric measurements were made is needed. In response, we have now added the following to the Methods section (lines 548-568):

      Time-lapse videos were stabilized and bleach-corrected prior to analysis, which visibly reduced frame-toframe motion and intensity drift. In the presynaptic and active-zone mScarlet channel, a bleaching factor of ~1.15 was observed during the 15 sec recording. This bleaching can be corrected using the “Bleaching correction” tool in Huygens SVI. For presynaptic and active-zone GCaMP signals, there was minimal bleaching over these short imaging periods. Therefore, the bleaching correction step for GCaMP was skipped. Both GCaMP and mScarlet channels were processed using the default settings in the Huygens SVI “Deconvolution Wizard” (with the exception of the bleaching correction option). Deconvolution was performed using the CMLE algorithm with the Huygens default stopping criterion and a maximum of 30 iterations, such that the algorithm either converged earlier or, if convergence was not reached, was terminated at this 30iteration limit; no other iteration settings were used across the GCaMP series. ROIs were drawn on the processed images using Fiji ImageJ software, and mean fluorescence time courses were extracted for the GCaMP and mScarlet channels, yielding F<sub>GCaMP</sub>(t) and F<sub>mScarlet</sub>(t). F(t)s were imported into CaFire with GCaMP assigned to Channel #1 (signal; required) and mScarlet to Channel #2 (baseline/reference; optional). If desired, the mScarlet signal could be smoothed in CaFire using a user-specified moving-average window to reduce high-frequency noise. In CaFire’s ΔR/R mode, the per-frame ratio was computed as R(t)=F<sub>GCaMP</sub>(t) and F<sub>mScarlet</sub>(t); a baseline ratio R0 was estimated from the pre-stimulus period, and the final response was reported as ΔR/R(t)=[R(t)−R0]/R0, which normalizes GCaMP signals to the co-expressed mScarlet reference and thereby reduces variability arising from differences in sensor expression level or illumination across AZs.

      (5) At line 212, the authors claim "... GCaMP8m showing 345.7% higher SNR over GCaMP6s....(Fig. 3D and E) ", yet the cited figure panels do not present any SNR quantification. Figures 3D and E only show response amplitudes and kinetics, which are distinct from SNR. The methods section also does not describe details for how SNR was defined or computed.

      This is another good point. We define SNR operationally as the fractional fluorescence change (ΔF/F). Traces were processed with CaFire, which estimates a per-frame baseline F<sub>0</sub>(t) with a user-configurable sliding window and percentile. In the Load File panel, users can specify both the length of the moving baseline window and the desired percentile; the default settings are a 50-point window and the 30th percentile, representing a 101-point window centered on each time point (previous 50 to next 50 samples) and took the lower 30% of values within that window to estimate F<sub>0</sub>(t). The signal was then computed as ΔF/F=[F(t)−F0(t)]/F0(t). This ΔF/F value is what we report as SNR throughout the manuscript and is now discussed explicitly in the revised methods (lines 686-693).

      (6) Lines 285-287 "As expected, summed ΔF values scaled strongly and positively with AZ size (Fig. 5F), reflecting a greater number of Cav2 channels at larger AZs". I am not sure about this conclusion. A positive correlation between summed ΔF values and AZ size could simply reflect more GCaMP molecules in larger AZs, which would give rise to larger total fluorescence change even at a given level of calcium increase.

      The reviewer makes a good point, one that we agree should be clarified. The reviewer is indeed correct that larger active zones should have more abundant BRP protein, which in turn will lead to a higher abundance of the Bar8f sensor, which should lead to a higher GCaMP response simply by having more of this sensor. However, the inclusion of the ratiometric mScarlet protein should normalize the response accurately, correcting for this confound, in which the higher abundance of GCaMP should be offset (normalized) by the equally (stoichiometric) higher abundance of mScarlet. Therefore, when the ∆R/R is calculated, the differences in GCaMP abundance at each AZ should be corrected for the ratiometric analysis. We now use an improved BRP::mScarlet3::GCaMP8m (Bar8m) and compute ΔR/R with R(t)=F<sub>GCaMP8m</sub>/F<sub>mScarlet3</sub>. ROIs were drawn over individual AZs (Fig. 6B). CaFire estimated R0 with a sliding 101-point window using the lowest 10% of values, and responses were reported as ΔR/R=[R−R0]/R0. Area-scan examples (118 fps) show robust ΔR/R transients (peaks ≈1.90 and 3.28; tau rise ≈9.0–9.3 ms; Fig. 6C, middle).

      We have now made these points more clearly in the manuscript (lines 700-704) and moved the Bar8f intensity vs active zone size data to Table S1. Together, these revisions improve the indicator-abundance confound (via mScarlet normalization). 

      (6) Lines 313-314: "SynapGCaMP quantal signals appeared to qualitatively reflect the same events measured with electrophysiological recordings (Fig. 6D)." This statement is quite confusing. In Figure 6D, the corresponding calcium and ephys traces look completely different and appear to reflect distinct sets of events. It was only after reading Figure 7 that I realized the traces shown in Figure 6D might not have been recorded simultaneously. The authors should clarify this point.

      Yes, we absolutely agree with this point, one shared by Reviewer 1. In response, we have removed the electrophysiological traces in Fig. 6 to clarify that just the calcium responses are shown, and save the direct comparison for the Fig. 7 data (now revised Fig. 8).

      (8) Lines 310-313: "SynapGCaMP8m .... striking an optimal balance between speed and sensitivity", and Lines 314-316: "We conclude that SynapGCaMP8m is an optimal indicator to measure quantal transmission events at the synapse." Statements like these are subjective. In the authors' own comparison, GCaMP8m is significantly slower than GCaMP8f (at least in terms of decay time), despite having a moderately higher response amplitude. It is therefore unclear why GCaMP8m is considered 'optimal'. The authors should clarify this point or explain their rationale for prioritizing response amplitude over speed in the context of their application.

      This is another good point that we agree with, as the “optimal” sensor will of course depend on the user’s objectives. Hence, we used the term “an optimal sensor” to indicate it is what we believed to be the best one for our own uses. However, this point should be clarified and better discussed. In response, we have revised the relevant sections of the manuscript to better define why we chose the 8m sensors to strike an optimal balance of speed and sensitivity for our uses, and go on to discuss situations in which other sensor variants might be better suited. These are now presented in lines 223-236 in the revised manuscript, and we thank the reviewer for making these comments, which have improved our study.

      Minor comments

      (1)  Please include the following information in the Methods section:

      (a) For Figures 3 and 4, specify how action potentials were evoked. What type of electrodes were used, where were they placed, and what amount of current or voltage was applied?

      We apologize for neglecting to include this information in the original submission. We have now added this information to the revised Methods section (lines 537-543).

      (b) For imaging experiments, provide information on the filter sets used for each imaging channel, and describe how acquisition was alternated or synchronized between the green and red channels in ratiometric measurements. Additionally, please report the typical illumination intensity (in mW/mm²) for each experimental condition.

      We thank the reviewer for this helpful comment. We have now added detailed information about the imaging configuration to the Methods (lines 512-528) with the following:

      Ca2+ imaging was conducted using a Nikon A1R resonant scanning confocal microscope equipped with a 60x/1.0 NA water-immersion objective (refractive index 1.33). GCaMP signals were acquired using the FITC/GFP channel (488-nm laser excitation; emission collected with a 525/50-nm band-pass filter), and mScarlet/mCherry signals were acquired using the TRITC/mCherry channel (561-nm laser excitation; emission collected with a 595/50-nm band-pass filter). ROIs focused on terminal boutons of MN-Ib or -Is motor neurons. For both channels, the confocal pinhole was set to a fixed diameter of 117.5 µm (approximately three Airy units under these conditions), which increases signal collection while maintaining adequate optical sectioning. Images were acquired as 256 × 64 pixel frames (two 12-bit channels) using bidirectional resonant scanning at a frame rate of ~118 frames/s; the scan zoom in NIS-Elements was adjusted so that this field of view encompassed the entire neuromuscular junction and was kept constant across experiments. In ratiometric recordings, the 488-nm (GCaMP) and 561-nm (mScarlet) channels were acquired in a sequential dual-channel mode using the same bidirectional resonant scan settings: for each time point, a frame was first collected in the green channel and then immediately in the red channel, introducing a small, fixed frame-to-frame temporal offset while preserving matched spatial sampling of the two channels.

      Directly measuring the absolute laser power at the specimen plane (and thus reporting illumination intensity in mW/mm²) is technically challenging on this resonant-scanning system, because it would require inserting a power sensor into the beam path and perturbing the optical alignment; consequently, we are unable to provide reliable absolute mW/mm² values. Instead, we now report all relevant acquisition parameters (objective, numerical aperture, refractive index, pinhole size, scan format, frame rate, and fixed laser/detector settings) and note that laser powers were kept constant within each experimental series and chosen to minimize bleaching and phototoxicity while maintaining an adequate signal-to-noise ratio. We have now added the details requested in the revised Methods section (lines 512-535), including information about the filter sets, acquisition settings, and typical illumination intensity.

      (2) Please clarify what the thin versus thick traces represent in Figures 3D, 3F, 4C, and 4E. Are the thin traces individual trials from the same experiment, or from different experiments/animals? Does the thick trace represent the mean/median across those trials, a fitted curve, or a representative example?

      We apologize this was not more clear in the original submission. Thin traces are individual stimulus-evoked trials (“sweeps”) acquired sequentially from the same muscle/NMJ in a single preparation; the panel is shown as a representative example of recordings collected across animals. The thick colored trace is the trialaveraged waveform (arithmetic mean) of those thin traces after alignment to stimulus onset and baseline subtraction (no additional smoothing beyond what is stated in Methods). The thick black curve over the decay phase is a single-exponential fit used to estimate τ. Specifically, we fit the decay segment by linear regression on the natural-log–transformed baseline-subtracted signal, which is equivalent to fitting y = y<sub>peak</sub>·e<sup>−t/τdecay</sup> over the decay window (revised Fig.4D and Fig.5C legends).

      (3) Please clarify what the reported sample size (n) represents. Does it indicate the number of experimental repeats, the number of boutons or PSDs, or the number of animals?

      Again, we apologize this was not clear. (n) refers to the number of animals (biological replicates), which is reported in Supplementary Table 1. All imaging was performed at muscle 6, abdominal segment A3. Per preparation, we imaged 1-2 NMJs in total, with each imaging targeting 2–3 terminal boutons at the target NMJ and acquired 2–3 imaging stacks choosing different terminal boutons per NMJ. For the standard stimulation protocol, we delivered 1 Hz stimulation for 1ms and captured 14 stimuli in a 15s time series imaging (lines 730-736).

      Reviewer #3

      Genetically encoded calcium indicators (GECIs) are essential tools in neurobiology and physiology. Technological constraints in targeting and kinetics of previous versions of GECIs have limited their application at the subcellular level. Chen et al. present a set of novel tools that overcome many of these limitations. Through systematic testing in the Drosophila NMJ, they demonstrate improved targeting of GCaMP variants to synaptic compartments and report enhanced brightness and temporal fidelity using members of the GCaMP8 series. These advancements are likely to facilitate more precise investigation of synaptic physiology.

      This is a comprehensive and detailed manuscript that introduces and validates new GECI tools optimized for the study of neurotransmission and neuronal excitability. These tools are likely to be highly impactful across neuroscience subfields. The authors are commended for publicly sharing their imaging software.

      This manuscript could be improved by further testing the GECIs across physiologically relevant ranges of activity, including at high frequency and over long imaging sessions. The authors provide a custom software package (CaFire) for Ca2+ imaging analysis; however, to improve clarity and utility for future users, we recommend providing references to existing Ca2+ imaging tools for context and elaborating on some conceptual and methodological aspects, with more guidance for broader usability. These enhancements would strengthen this already strong manuscript.

      We thank the Reviewer for their overall positive evaluation and comments. 

      Major comments:

      (1) Evaluation of the performance of new GECI variants using physiologically relevant stimuli and frequency. The authors took initial steps towards this goal, but it would be helpful to determine the performance of the different GECIs at higher electrical stimulation frequencies (at least as high as 20 Hz) and for longer (10 seconds) (Newman et al, 2017). This will help scientists choose the right GECI for studies testing the reliability of synaptic transmission, which generally requires prolonged highfrequency stimulation.

      We appreciate this point by the reviewer and agree it would be of interest to evaluate sensor performance with higher frequency stimulation and for a longer duration. In response, we performed a variety of stimulation protocols at high intensities and times, but found the data to be difficult to separate individual responses given the decay kinetics of all calcium sensors. Hence, we elected not to include these in the revised manuscript. However, we have now included an evaluation of the sensors with 20 Hz electrical stimulation for ~1 sec using a direct comparison of Scar8f with OGB-1. These data are now presented in a new Fig. 3D,E and discussed in the manuscript (lines 396-403).

      (2) CaFire.

      The authors mention, in line 182: 'Current approaches to analyze synaptic Ca2+ imaging data either repurpose software designed to analyze electrophysiological data or use custom software developed by groups for their own specific needs.' References should be provided. CaImAn comes to mind (Giovannucci et al., 2019, eLife), but we think there are other software programs aimed at analyzing Ca2+ imaging data that would permit such analysis.

      Thank you for the thoughtful question. At this stage, we’re unable to provide a direct comparison with existing analysis workflows. In surveying prior studies that analyze Drosophila NMJ Ca²⁺ imaging traces, we found that most groups preprocess images in Fiji/ImageJ and then rely on their own custom-made MATLAB or Python scripts for downstream analysis (see Blum et al. 2021; Xing and Wu 2018). Because these pipelines vary widely across labs, a standardized head-to-head evaluation isn’t currently feasible. With CaFire, our goal is to offer a simple, accessible tool that does not require coding experience and minimizes variability introduced by custom scripts. We designed CaFire to lower the barrier to entry, promote reproducibility, and make quantal event analysis more consistent across users. We have added references to the sentence mentioned above.

      Regarding existing software that the reviewer mentioned – CaImAn (Giovannucci et al. 2019): We evaluated CaImAn, which is a powerful framework designed for large-scale, multicellular calcium imaging (e.g., motion correction, denoising, and automated cell/ROI extraction). However, it is not optimized for the per-event kinetics central to our project - such as extracting rise and decay times for individual quantal events at single synapses. Achieving this level of granularity would typically require additional custom Python scripting and parameter tuning within CaImAn’s code-centric interface. This runs counter to CaFire’s design goals of a nocode, task-focused workflow that enables users to analyze miniature events quickly and consistently without specialized programming expertise.

      Regarding Igor Pro (WaveMetrics), (Müller et al. 2012): Igor Pro is another platform that can be used to analyze calcium imaging signals. However, it is commercial (paid) software and generally requires substantial custom scripting to fit the specific analyses we need. In practice, it does not offer a simple, open-source, point-and-click path to per-event kinetic quantification, which is what CaFire is designed to provide.

      The authors should be commended for making their software publicly available, but there are some questions:

      How does CaFire compare to existing tools?

      As mentioned above, we have not been able to adapt the custom scripts used by various labs for our purposes, including software developed in MatLab (Blum et al. 2021), Python (Xing and Wu 2018), and Igor (Müller et al. 2012). Some in the field do use semi-publically available software, including Nikon Elements (Chen and Huang 2017) and CaImAn (Giovannucci et al. 2019). However, these platforms are not optimized for the per-event kinetics central to our project - such as extracting rise and decay times for individual quantal events at single synapses. We have added more details about CaFire, mainly focusing on the workflow and measurements, highlighting the superiority of CaFire, showing that CaFire provides a no-code, standardized pipeline with automated miniature-event detection and per-event metrics (e.g., amplitude, rise time τ, decay time τ), optional ΔR/R support, and auto-partition feature. Collectively, these features make CaFire simpler to operate without programming expertise, more transparent and reproducible across users, and better aligned with the event-level kinetics required for this project.

      Very few details about the Huygens deconvolution algorithms and input settings were provided in the methods or text (outside of MLE algorithm used in STED images, which was not Ca2+ imaging). Was it blind deconvolution? Did the team distill the point-spread function for the fluorophores? Were both channels processed for ratiometric imaging? Were the same settings used for each channel? Importantly, please include SVI Huygens in the 'Software and Algorithms' Section of the methods.

      We thank the reviewer for raising this important point. We have now expanded the Methods to describe our use of Huygens in more detail and have added SVI Huygens Professional (Scientific Volume Imaging, Hilversum, The Netherlands) to the “Software and Algorithms” section. For Ca²⁺ imaging data, time-lapse stacks were processed in the Huygens Deconvolution Wizard using the standard estimation algorithm (CMLE). This is not a blind deconvolution procedure. Instead, Huygens computes a theoretical point-spread function (PSF) from the full acquisition metadata (objective NA, refractive index, voxel size/sampling, pinhole, excitation/emission wavelengths, etc.); if refractive index values are provided and there is a mismatch, the PSF is adjusted to account for spherical aberration. We did not experimentally distill PSFs from bead measurements, as Huygens’ theoretical PSFs are sufficient for our data.

      Both green (GCaMP) and red (mScarlet) channels were processed for ratiometric imaging using the same workflow (stabilization, optional bleaching correction, and deconvolution within Huygens). For each channel, the PSF, background, and SNR were estimated automatically by the same built-in algorithms, so the underlying procedures were identical even though the numerical values differ between channels because of their distinct wavelengths and noise characteristics. Importantly, Huygens normalizes each PSF to unit total intensity, such that the deconvolution itself does not add or remove signal and therefore preserves intensity ratios between channels; only background subtraction and bleaching correction can change absolute fluorescence values. For the mScarlet channel, where we observed modest bleaching (~1.10 over 15 sec), we applied Huygens’ bleaching correction and visually verified that similar structures maintained comparable intensities after correction. For presynaptic GCaMP signals, bleaching over these short recordings was negligible, so we omitted the bleaching-correction step to avoid introducing multiplicative artifacts. This workflow ensures that ratiometric ΔR/R measurements are based on consistently processed, intensity-conserving deconvolved images in both channels.

      The number of deconvolution iterations could have had an effect when comparing GCAMP series; please provide an average number of iterations used for at least one experiment. For example, Figure 3, Syt::GCAMP6s, Scar8f & Scar8m, and, if applicable, the maximum number of permissible iterations.

      We thank the reviewer for this comment. For all Ca²⁺ imaging datasets, deconvolution in Huygens was performed using the recommended default settings of the CMLE algorithm with a maximum of 30 iterations. The stopping criterion was left at the Huygens default, so the algorithm either converged earlier or, if convergence was not reached, terminated at this 30-iteration limit. No other iteration settings were used across the GCaMP series (lines 555-559).

      Please clarify if the 'Express' settings in Huygens changed algorithms or shifted input parameters.

      We appreciate the reviewer’s question regarding the Huygens “Express” settings. For clarity, we note that all Ca²⁺ imaging data reported in this manuscript were deconvolved using the “Deconvolution Wizard”, not the “Deconvolution Express” mode. In the Wizard, we explicitly selected the CMLE algorithm (or GMLE in a few STED-related cases as recommended by SVI), using the recommended maximum of 30 iterations, and other recommended settings while allowing Huygens to auto-estimate background and SNR for each channel.Bleaching correction was toggled manually per channel (applied to mScarlet when bleaching was evident, omitted for GCaMP when bleaching was negligible), as described in the revised Methods (lines 553-559).

      By contrast, the Deconvolution Express tool in Huygens is a fully automated front-end that can internally adjust both the choice of deconvolution algorithm (e.g., CMLE vs. GMLE/QMLE) and key input parameters such as SNR, number of iterations, and quality threshold based on the selected “smart profile” and the image metadata. In preliminary tests on our datasets, Express sometimes produced results that were either overly smoothed or showed subtle artifacts, so we did not use it for any data included in this study. Instead, we relied exclusively on the Wizard with explicitly controlled settings to ensure consistency and transparency across all GCaMP series and ratiometric analyses.

      We suggest including a sample data set, perhaps in Excel, so that future users can beta test on and organize their data in a similar fashion.

      We agree that this would be useful, a point shared by R1 above. In response, we have added a sample data set to the GitHub site and included sample ImageJ data along with screenshots to explain the analysis in more detail. These improvements are discussed in the manuscript (lines 705-708).

      (3) While the challenges of AZ imaging are mentioned, it is not discussed how the authors tackled each one. What is defined as an active zone? Active zones are usually identified under electron microscopy. Arguably, the limitation of GCaMP-based sensors targeted to individual AZs, being unable to resolve local Ca2+ changes at individual boutons reliably, might be incorrect. This could be a limitation of the optical setup being used here. Please discuss further. What sensor performance do we need to achieve this performance level, and/or what optical setup would we need to resolve such signals?

      We appreciate the reviewer’s thoughtful comments and agree that the technical challenges of active zone (AZ) Ca²⁺ imaging merit further clarification. We defined AZs, as is the convention in our field, as individual BRP puncta at NMJs. These BRP puncta co-colocalize with individual puncta of other AZ components, including CAC, RBP, Unc13, etc. ROIs were drawn tightly over individual BRP puncta and only clearly separable spots were included.

      To tackle the specific obstacles of AZ imaging (small signal volume, high AZ density, and limited photon budget at high frame rates), we implemented both improved sensors and optimized analysis (Fig. 6). First, we introduced a ratiometric AZ-targeted indicator, BRP::mScarlet3::GCaMP8m (Bar8m), and computed ΔR/R with ΔR/R with R(t)=F<sub>GCaMP8m</sub>/F<sub>mScarlet3</sub>. ROIs were drawn over individual AZs (Fig. 6B). Under our standard resonant area-scan conditions (~118 fps), Bar8m produces robust ΔR/R transients at individual AZs (example peaks ≈ 3.28; τ<sub>rise</sub>≈9.0 ms; Fig. 6C, middle), indicating that single-AZ signals can be detected reproducibly when AZs are optically resolvable.

      Second, we increased temporal resolution using high-speed Galvano line-scan imaging (~1058 fps), which markedly sharpened the apparent kinetics (τ<sub>rise</sub>≈3.23 ms) and revealed greater between-AZ variability (Fig. 6C, right; 6D–E). Population analyses show that line scans yield much faster rise times than area scans (Fig. 6D) and a dramatically higher fraction of significantly different AZ pairs (8.28% and 4.14% in 8f and 8m areascan vs 78.62% in 8m line-scan, lines 721-725), uncovering pronounced AZ-to-AZ heterogeneity in Ca²⁺ signals. Together, these revisions demonstrate that under our current confocal configuration, AZ-targeted GCaMP8m can indeed resolve local Ca²⁺ changes at individual, optically isolated boutons.

      We have revised the Discussion to clarify that our original statement about the limitations of AZ-targeted GCaMPs refers specifically to this combination of sensor and optical setup, rather than an absolute limitation of AZ-level Ca²⁺ imaging. In our view, further improvements in baseline brightness and dynamic range (ΔF/F or ΔR/R per action potential), combined with sub-millisecond kinetics and minimal buffering, together with optical configurations that provide smaller effective PSFs and higher photon collection (e.g., higher-NA objectives, optimized 2-photon or fast line-scan modalities, and potentially super-resolution approaches applied to AZ-localized indicators), are likely to be required to achieve routine, high-fidelity Ca²⁺ measurements at every individual AZ within a neuromuscular junction.

      (4) In Figure 5: Only GCAMP8f (Bar8f fusion protein) is tested here. Consider including testing with GCAMP8m. This is particularly relevant given that GCAMP8m was a more successful GECI for subcellular post-synaptic imaging in Figure 6.

      We appreciate this point and request by Reviewer 3. The main limitation for detecting local calcium changes at AZs is the speed of the calcium sensor, and hence we used the fastest available (GCaMP8f) to test the Bar8f sensor. While replacing GCaMP8f with GCaMP8m would indeed be predicted to enhance sensitivity (SNR), since GCaMP8m does not have faster kinetics relative to GCaMP8f, it is unlikely to be a more successful GECI for visualizing local calcium differences at AZs. 

      That being said, we agree that the Bar8m tool, including the improved mScarlet3 indicator, would likely be of interest and use to the field. Fortunately, we had engineered the Bar8m sensor while this manuscript was in review, and just recently received transgenic flies. We have evaluated this sensor, as requested by the reviewer, and included our findings in Fig. 1 and 6. In short, while the sensitivity is indeed enhanced in Bar8m compared to Bar8f, the kinetics remain insufficient to capture local AZ signals. These findings are discussed in the revised manuscript (lines 424-442, 719-730), and we appreciate the reviewer for raising these important points.

      In earlier experiments, Bar8f yielded relatively weak fluorescence, so we traded frame rate for image quality during resonant area scans (~60 fps). After switching to Bar8m, the signal was bright enough to restore our standard 118 fps area-scan setting. Nevertheless, even with dual-channel resonant area scans and ratiometric (GCaMP/mScarlet) analysis, AZ-to-AZ heterogeneity remained difficult to resolve. Because Ca²⁺ influx at individual active zones evolves on sub-millisecond timescales, we adopted a high-speed singlechannel Galvano line-scan (~1 kHz) to capture these rapid transients. We first acquired a brief area image to localize AZ puncta, then positioned the line-scan ROI through the center of the selected AZ. This configuration provided the temporal resolution needed to uncover heterogeneity that was under-sampled in area-scan data. Consistent with this, Bar8m line-scan data showed markedly higher AZ heterogeneity (significant AZ-pair rate ~79%, vs. ~8% for Bar8f area scans and ~4% for Bar8m area scans), highlighting Bar8m’s suitability for quantifying AZ diversity. We have updated the text, Methods, and figure legend accordingly (tell reviewer where to find everything).

      (5) Figure 5D and associated datasets: Why was Interquartile Range (IQR) testing used instead of ZScoring? Generally, IQR is used when the data is heavily skewed or is not normally distributed. Normality was tested using the D'Agostino & Pearson omnibus normality test and found that normality was not violated. Please explain your reasoning for the approach in statistical testing. Correlation coefficients in Figures 5 E & F should also be reported on the graph, not just the table. In Supplementary Table 1. The sub-table between 4D-F and 5E-F, which describes the IQR, should be labeled as such and contain identifiers in the rows describing which quartile is described. The table description should be below. We would recommend a brief table description for each sub-table.

      Thank you for this helpful suggestion. We have updated the analysis in two complementary ways. First, we now perform paired two-tailed t-tests between every two AZs within the same preparation (pairwise AZ–AZ comparisons of peak responses). At α<0.05, the fraction of significant AZ pairs is ~79% for Bar8m line-scan data versus ~8% for Bar8f area-scan data, indicating markedly greater AZ-to-AZ diversity when measured at high temporal resolution. Second, for visually marking the outlying AZs, we re-computed the IQR (Q1–Q3) based on the individual values collected from each AZs(15 data points per AZ, 30 AZs for each genotype), and marked AZs whose mean response falls above Q3 or below Q1; IQR is used here solely as a robust dispersion reference rather than for hypothesis testing. Both analyses support the same observation: Bar8m line-scan data reveal substantially higher AZ heterogeneity than Bar8f and Bar8m area-scan data. We have revised the Methods, figure panels, and legends accordingly (t-test details; explicit “IQR (Q1–Q3)” labeling; significant AZ-pair rates reported on the plots) (lines 719-730).

      (6) Figure 6 and associated data. The authors mention: ' SynapGCaMP quantal signals appeared to qualitatively reflect the same events measured with electrophysiological recordings (Fig. 6D).' If that was the case, shouldn't the ephys and optical signal show some sort of correlation? The data presented in Figure 6D show no such correlation. Where do these signals come from? It is important to show the ROIs on a reference image.

      We apologize this was not clear, as similar points were raised by R1 and R2. We were just showing separate (uncorrelated) sample traces of electrophysiological and calcium imaging data. Given how confusing this presentation turned out to be, and the fact that we show the correlated ephys and calcium imaging events in Fig. 7, we have elected to remove the uncorrelated electrophysiological events in Fig. 6 to just focus on the calcium imaging events (now Figures 7 and 8).

      Figure 7B: Were Ca2+ transients not associated with mEPSPs ever detected? What is the rate of such events?

      This is an astute question. Yes indeed, during simultaneous calcium imaging and current clamp electrophysiology recordings, we occasionally observed GCaMP transients without a detectable mEPSP in the electrophysiological trace. This may reflect the detection limit of electrophysiology for very small minis; with our noise level and the technical limitation of the recording rig, events < ~0.2 mV cannot be reliably detected, whereas the optical signal from the same quantal event might still be detected. The fraction of calcium-only events was ~1–10% of all optical miniature events, depending on genotype (higher in lines with smaller average minis). These calcium-only detections were low-amplitude and clustered near the optical threshold (lines 361-365).

      Minor comments

      (1) It should be mentioned in the text or figure legend whether images in Figure 1 were deconvolved, particularly since image pre-processing is only discussed in Figure 2 and after.

      We thank the reviewer for pointing this out. Yes, the confocal images shown in Figure 1 were also deconvolved in Huygens using the CMLE-based workflow described in the revised Methods. We applied deconvolution to improve contrast, reduce out-of-focus blur, and better resolve the morphology of presynaptic boutons, active zones, and postsynaptic structures, so that the localization of each sensor is more clearly visualized. We have now explicitly stated in the Fig. 1 legend and Methods (lines 575-577) that these images were deconvolved prior to display. 

      (2) The abbreviation, SNR, signal-to-noise ratio, is not defined in the text.

      We have corrected this error and thank the reviewer for pointing this out.

      (3) Please comment on the availability of fly stocks and molecular constructs.

      We have clarified that all fly stocks and molecular constructs will be shared upon request (lines 747-750). We are also in the process of depositing the new Scar8f/m, Bar8f/m, and SynapGCaMP sensors to the Bloomington Drosophila Stock Center for public dissemination.

      (4) Please add detection wavelengths and filter cube information for live imaging experiments for both confocal and widefield.

      We thank the reviewer for this helpful suggestion. We have now added the detection wavelengths and filter cube configurations for both confocal and widefield live imaging to the Methods.

      For confocal imaging, GCaMP signals were acquired on a Nikon A1R system using the FITC/GFP channel (488-nm laser excitation; emission collected with a 525/50-nm band-pass filter), and mScarlet signals were acquired using the TRITC/mCherry channel (561-nm laser excitation; emission collected with a 595/50-nm band-pass filter). Both channels were detected with GaAsP detectors under the same pinhole and scan settings described above (lines 512-517).

      For widefield imaging, GCaMP was recorded using a GFP filter cube (LED excitation ~470/40 nm; emission ~525/50 nm), which is now explicitly described in the revised Methods section (lines 632-633).

      (5) Please include a mini frequency analysis in Supplemental Figure S1.

      We apologize for not including this information in the original submission. This is now included in the Supplemental Figure S1.

      (6) In Figure S1B, consider flipping the order of EPSP (currently middle) and mEPSP (currently left), to easily guide the reader through the quantification of Figure S1A (EPSPs, top traces & mEPSPs, bottom traces).

      We agree these modifications would improve readability and clarity. We have now re-ordered the electrophysiological quantifications in Fig. S1B as requested by the reviewer.

      (7) Figure 6C: Consider labeling with sensor name instead of GFP.

      We agree here as well, and have removed “GFP” and instead added the GCaMP variant to the heatmap in Fig. 7C.

      (8) Figure 6E, 7B, 7E: Main statistical differences highlighting sensor performance should be represented on the figures for clarity.

      We did not show these differences in the original submission in an effort to keep the figures “clean” and for clarity, putting the detailed statistical significance in Table S1. However, we agree with the reviewer that it would be easier to see these in the Fig. 6E and 7B,E graphs. This information has now been added the Figs. 7 and 8.

      (9) Please report if the significance tested between the ephys mini (WT vs IIB-/-, WT vs IIA-/-, IIB-/- vs IIA-/-) is the same as for Ca2+ mini (WT vs IIB-/-, WT vs IIA-/-, IIB-/- vs IIA-/-). These should also exhibit a very high correlation (mEPSP (mV) vs Ca2+ mini deltaF/F). These tests would significantly strengthen the final statement of "SynapGCaMP8m can capture physiologically relevant differences in quantal events with similar sensitivity as electrophysiology."

      We agree that adding the more detailed statistical analysis requested by the reviewer would strengthen the evidence for the resolution of quantal calcium imaging using SynapGCaMP8m. We have included the statistical significance between the ephys and calcium minis in Fig. 8 and included the following in the revised methods (lines 358-361), the Fig. 8 legend and Table S1:

      Using two-sample Kolmogorov–Smirnov (K–S) tests, we found that SynapGCaMP8m Ca²⁺ minis (ΔF/F, Fig. 8E) differ significantly across all genotype pairs (WT vs IIB<sup>-/-</sup>, WT vs IIA<sup>-/-</sup>, IIB<sup>-/-</sup> vs IIA<sup>-/-</sup>; all p < 0.0001). The genotype rank order of the group means (±SEM) is IIB<sup>-/-</sup> > WT > IIA<sup>-/-</sup> (0.967 ± 0.036; 0.713 ± 0.021; 0.427 ± 0.017; n=69, 65, 59). For electrophysiological minis (mEPSP amplitude, Fig. 8F), K–S tests likewise show significant differences for the same comparisons (all p < 0.0001) with D statistics of 0.1854, 0.3647, and 0.4043 (WT vs IIB<sup>-/-</sup>, WT vs IIA<sup>-/-</sup>, IIB<sup>-/-</sup> vs IIA<sup>-/-</sup>, respectively). Group means (±SEM) again follow IIB<sup>-/-</sup> > WT > IIA<sup>-/-</sup> (0.824 ± 0.017 mV; 0.636 ± 0.015 mV; 0.383 ± 0.007 mV; n=41 each). These K–S results demonstrate identical significance and rank order across modalities, supporting our conclusion that SynapGCaMP8m resolves physiologically relevant quantal differences with sensitivity comparable to electrophysiology.

      References

      Blum, Ian D., Mehmet F. Keleş, El-Sayed Baz, Emily Han, Kristen Park, Skylar Luu, Habon Issa, Matt Brown, Margaret C. W. Ho, Masashi Tabuchi, Sha Liu, and Mark N. Wu. 2021. 'Astroglial Calcium Signaling Encodes Sleep Need in Drosophila', Current Biology, 31: 150-62.e7.

      Chen, Y., and L. M. Huang. 2017. 'A simple and fast method to image calcium activity of neurons from intact dorsal root ganglia using fluorescent chemical Ca(2+) indicators', Mol Pain, 13: 1744806917748051.

      Giovannucci, Andrea, Johannes Friedrich, Pat Gunn, Jérémie Kalfon, Brandon L. Brown, Sue Ann Koay, Jiannis Taxidis, Farzaneh Najafi, Jeffrey L. Gauthier, Pengcheng Zhou, Baljit S. Khakh, David W. Tank, Dmitri B. Chklovskii, and Eftychios A. Pnevmatikakis. 2019. 'CaImAn an open source tool for scalable calcium imaging data analysis', eLife, 8: e38173.

      Müller, M., K. S. Liu, S. J. Sigrist, and G. W. Davis. 2012. 'RIM controls homeostatic plasticity through modulation of the readily-releasable vesicle pool', J Neurosci, 32: 16574-85.

      Wu, Yifan, Keimpe Wierda, Katlijn Vints, Yu-Chun Huang, Valerie Uytterhoeven, Sahil Loomba, Fran Laenen, Marieke Hoekstra, Miranda C. Dyson, Sheng Huang, Chengji Piao, Jiawen Chen, Sambashiva Banala, Chien-Chun Chen, El-Sayed Baz, Luke Lavis, Dion Dickman, Natalia V. Gounko, Stephan Sigrist, Patrik Verstreken, and Sha Liu. 2025. 'Presynaptic Release Probability Determines the Need for Sleep', bioRxiv: 2025.10.16.682770.

      Xing, Xiaomin, and Chun-Fang Wu. 2018. 'Unraveling Synaptic GCaMP Signals: Differential Excitability and Clearance Mechanisms Underlying Distinct Ca<sup>2+</sup> Dynamics in Tonic and Phasic Excitatory, and Aminergic Modulatory Motor Terminals in Drosophila', eneuro, 5: ENEURO.0362-17.2018.

    1. t provides "no-code" components for vector store integration (e.g., Pinecone), prompt engineering, and RAG (Retrieval-Augmented Generation) workflows.

      Matillion is now focused on chunking, OCR, embeddings etc for vector databases

    2. dbt (Data Build Tool) has become the industry standard for SQL transformations, offering a code-based, version-controlled alternative to Matillion’s visual UI.

      DBT is already starting to dominate the AI data transformation market

    1. highly compatible with existing Postgres code, engineering teams can migrate with minimal rewriting of the application layer.

      This is probably a big decision when actually going to market

    1. proffers an Afrocentric model in which Nommo is graphi-cally posited as the center around which eight elements—rhythm,soundin’, stylin’, improvisation, storytelling, lyrical code, image mak-ing, and call and response

      nommo and its 8 ( eight ) elements

    Annotators

    1. Stratégies et Outils pour une Coopération Efficace en Milieu Scolaire

      Résumé Exécutif

      La coopération en classe ne se limite pas à un simple travail de groupe ; elle constitue un levier d'apprentissage puissant et une compétence citoyenne inscrite au socle commun (cycles 3 et 4).

      Ce document synthétise les approches pédagogiques et les outils pragmatiques nécessaires pour transformer la coopération d'une contrainte organisationnelle en un moteur de réussite.

      Les points clés incluent l'adoption d'une posture de « lâcher-prise » par l'enseignant, l'instauration d'un cadre structuré pour la gestion du bruit et des rôles, ainsi que l'utilisation d'outils de suivi visuels comme le tétraèdre.

      L'évaluation, centrée sur la compétence coopérative elle-même plutôt que sur le seul produit final, s'avère essentielle pour l'autonomisation des élèves.

      1. Fondements et Enjeux de la Coopération

      La coopération est définie comme l'acte d'apprendre ensemble par le partage d'idées, l'entraînement mutuel et la confrontation des points de vue.

      Elle ne doit pas être perçue comme une simple modalité pratique, mais comme une mission fondamentale de l'école.

      Légitimité institutionnelle : La coopération est une compétence du socle commun de connaissances, de compétences et de culture. Elle fait l'objet d'un apprentissage explicite et d'une évaluation.

      Validation scientifique : Une étude publiée dans la revue Science en 2019 confirme que les étudiants apprennent mieux lorsqu'ils sont actifs, malgré une perception parfois inverse par rapport aux cours magistraux.

      Compétences transversales développées :

      ◦ Organisation et planification.  

      ◦ Débat, argumentation et écoute active.  

      ◦ Gestion des émotions et des conflits.  

      ◦ Capacité à faire des concessions.

      2. La Posture de l'Enseignant : Le « Lâcher-Prise » Cadre

      Pour réussir, l'enseignant doit accepter de modifier sa posture.

      Le « lâcher-prise » ne signifie pas l'autogestion totale, mais la délégation et l'acceptation de l'imprévisible.

      Acceptation de l'erreur : Laisser les élèves chercher, se tromper et recommencer.

      Gestion de l'imprévu : Anticiper que les débats peuvent être houleux et que le niveau sonore augmentera.

      Constitution des groupes : Il n'existe pas de solution universelle.

      Le choix (affinités, imposé ou aléatoire) dépend des objectifs pédagogiques et de la dynamique de la classe.

      L'organisation peut évoluer au fil de l'année selon les besoins constatés.

      3. Gestion de l'Espace et de la Dynamique de Groupe

      L'environnement physique et sonore doit être rigoureusement pensé pour limiter les débordements.

      La gestion du bruit

      Le chuchotement n'est pas inné ; il doit faire l'objet d'un enseignement.

      Une technique consiste à faire placer la main sur la gorge pour sentir l'absence de vibration des cordes vocales lors du chuchotement.

      Signaux d'arrêt : Utiliser des outils pour préserver la voix de l'enseignant (buzzer, sonnerie, feux tricolores ou signal verbal prédéfini).

      L'organisation spatiale

      Si possible, privilégier une classe flexible avec des tables mobiles. Dans une salle classique, il est recommandé de :

      • Créer des « coins groupes ».

      • Anticiper les règles de circulation (notamment vers les ressources en autonomie) pour éviter les déplacements massifs.

      Le Tétraèdre : Outil de régulation des interventions

      Pour éviter d'être sollicité de manière anarchique, l'enseignant peut utiliser un code couleur par groupe :

      | Couleur | Signification | | --- | --- | | Vert | Tout va bien, le groupe progresse. | | Bleu | Travail terminé ; demande de validation ou tutorat possible vers un autre groupe. | | Jaune | Question non urgente. | | Rouge | Blocage complet ; intervention urgente nécessaire. |

      4. Structuration de la Participation Individuelle

      Afin d'éviter qu'un élève ne se retrouve isolé ou, à l'inverse, n'assume toute la charge de travail, des outils de distribution des tâches sont nécessaires.

      Cartes de rôles : Distribuer des fonctions précises (scribe, orateur/oratrice, modérateur/modératrice, meneur/meneuse).

      Il est crucial de faire tourner ces rôles à chaque séance pour garantir l'équité.

      La méthode du « Placemat » : Utilisation d'une grande feuille divisée en cases individuelles entourant une case centrale de mise en commun.

      Cela impose un temps de réflexion personnel avant la production collective.

      5. Évaluation et Analyse de la Pratique

      L'évaluation doit porter sur la coopération en tant que compétence distincte de la production finale.

      Critères de réussite co-construits : Fournir une grille d'évaluation élaborée avec les élèves pour clarifier les attentes dès le début de l'année.

      L’Étoile de Sylvain Connac : Un outil d'auto-évaluation permettant aux élèves de porter un regard critique sur quatre axes :

      1. L'entente au sein du groupe.   

      2. La qualité de l'écoute.   

      3. La compréhension des consignes et des notions.   

      4. La gestion du temps.

      Feedback de fin de séance : Consacrer un temps court (un mot ou une phrase par groupe) pour ajuster les modalités lors de la séance suivante.

      Conclusion

      La coopération est un processus évolutif qui requiert de la patience.

      Commencer par des structures simples (travail en binôme, introduction progressive des rôles) permet de stabiliser le cadre avant de complexifier les dispositifs.

      L'objectif final demeure l'autonomisation et la responsabilité des élèves au sein du collectif.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary:

      Damaris et al. perform what is effectively an eQTL analysis on microbial pangenomes of E. coli and P. aeruginosa. Specifically, they leverage a large dataset of paired DNA/RNA-seq information for hundreds of strains of these microbes to establish correlations between genetic variants and changes in gene expression. Ultimately, their claim is that this approach identifies non-coding variants that affect expression of genes in a predictable manner and explain differences in phenotypes. They attempt to reinforce these claims through use of a widely regarded promoter calculator to quantify promoter effects, as well as some validation studies in living cells. Lastly, they show that these non-coding variations can explain some cases of antibiotic resistance in these microbes.

      Major comments

      Are the claims and the conclusions supported by the data or do they require additional experiments or analyses to support them?

      The authors convincingly demonstrate that they can identify non-coding variation in pangenomes of bacteria and associate these with phenotypes of interest. What is unclear is the extent by which they account for covariation of genetic variation? Are the SNPs they implicate truly responsible for the changes in expression they observe? Or are they merely genetically linked to the true causal variants. This has been solved by other GWAS studies but isn't discussed as far as I can tell here.

      We thank the reviewer for their effective summary of our study. Regarding our ability to identify variants that are causal for gene expression changes versus those that only “tag” the causal ones, here we have to again offer our apologies for not spelling out the limitation of GWAS approaches, namely the difficulty in separating associated with causal variants. This inherent difficulty is the main reason why we added the in-silico and in-vitro validation experiments; while they each have their own limitations, we argue that they all point towards providing a causal link between some of our associations and measured gene expression changes. We have amended the discussion (e.g. at L548) section to spell our intention out better and provide better context for readers that are not familiar with the pitfalls of (bacterial) GWAS.

      They need to justify why they consider the 30bp downstream of the start codon as non-coding. While this region certainly has regulatory impact, it is also definitely coding. To what extent could this confound results and how many significant associations to expression are in this region vs upstream?

      We agree with the reviewer that defining this region as “non-coding” is formally not correct, as it includes the first 10 codons of the focal gene. We have amended the text to change the definition to “cis regulatory region” and avoided using the term “non-coding” throughout the manuscript. Regarding the relevance of this including the early coding region, we have looked at the distribution of associated hits in the cis regulatory regions we have defined; the results are shown in Supplementary Figure 3.

      We quantified the distribution of cis associated variants and compared them to a 2,000 permutations restricted to the -200bp and +30bp window in both E. coli * (panel A) and P. aeruginosa* (panel B). As it can be seen, the associated variants that we have identified are mostly present in the 200bp region and the +30bp region shows a mild depletion relative to the random expectation, which we derived through a variant position shuffling approach (2,000 replicates). Therefore, we believe that the inclusion of the early coding region results in an appreciable number of associations, and in our opinion justify its inclusion as a putative “cis regulatory region”.

      The claim that promoter variation correlates with changes in measured gene expression is not convincingly demonstrated (although, yes, very intuitive). Figure 3 is a convoluted way of demonstrating that predicted transcription rates correlate with measured gene expression. For each variant, can you do the basic analysis of just comparing differences in promoter calculator predictions and actual gene expression? I.e. correlation between (promoter activity variant X)-(promoter activity variant Y) vs (measured gene expression variant X)-(measured gene expression variant Y). You'll probably have to

      We realize that we may not have failed to properly explain how we carried out this analysis, which we did exactly in the way the reviewer suggests here. We had in fact provided four example scatterplots of the kind the reviewer was requesting as part of Figure 4. We have added a mention of their presence in the caption of Figure 3.

      Figure 7 it is unclear what this experiment was. How were they tested? Did you generate the data themselves? Did you do RNA-seq (which is what is described in the methods) or just test and compare known genomic data?

      We apologize for the lack of clarity here; we have amended the figure’s caption and the corresponding section of the results (i.e. L411 and L418) to better highlight how the underlying drug susceptibility data and genomes came from previously published studies.

      Are the data and the methods presented in such a way that they can be reproduced?

      No, this is the biggest flaw of the work. The RNA-Seq experiment to start this project is not described at all as well as other key experiments. Descriptions of methods in the text are far too vague to understand the approach or rationale at many points in the text. The scripts are available on github but there is no description of what they correspond to outside of the file names and none of the data files are found to replicate the plots.

      We have taken this critique to heart, and have given more details about the experimental setup for the generation of the RNA-seq data in the methods as well as the results sections. We have also thoroughly reviewed any description of the methods we have employed to make sure they are more clearly presented to the readers. We have also updated our code repository in order to provide more information about the meaning of each script provided, although we would like to point out that we have not made the code to be general purpose, but rather as an open documentation on how the data was analyzed.

      Figure 8B is intended to show that the WaaQ operon is connected to known Abx resistance genes but uses the STRING method. This requires a list of genes but how did they build this list? Why look at these known ABx genes in particular? STRING does not really show evidence, these need to be substantiated or at least need to justify why this analysis was performed.

      We have amended the Methods section (“Gene interaction analysis”, L799) to better clarify how the network shown in this panel was obtained. In short, we have filtered the STRING database to identify genes connected to members of the waa operon with an interaction score of at least 0.4 (“moderate confidence”), excluding the “text mining” field. Antimicrobial resistance genes were identified according to the CARD database. We believe these changes will help the readers to better understand how we derived this interaction.

      Are the experiments adequately replicated and statistical analysis adequate?

      An important claim on MIC of variants for supplementary table 8 has no raw data and no clear replicates available. Only figure 6, the in vitro testing of variant expression, mentions any replicates.

      We have expanded the relevant section in the Methods (“Antibiotic exposure and RNA extraction”, L778) to provide more information on the way these assays were carried out. In short, we carried out three biological replicates, the average MIC of two replicates in closest agreement was the representative MIC for the strain. We believe that we have followed standard practice in the field of microbiology, but we agree that more details were needed to be provided in order for readers to appreciate this.

      Minor comments

      Specific experimental issues that are easily addressable..

      Are prior studies referenced appropriately?

      There should be a discussion of eQTLs in this. Although these have mostly been in eukaryotes a. https://doi.org/10.1038/s41588-024-01769-9 ; https://doi.org/10.1038/nrg3891.

      We have added these two references, which provide a broader context to our study and methodology, in the introduction.

      Line 67. Missing important citation for Ireland et al. 2020 https://doi.org/10.7554/eLife.55308

      Line 69. Should mention Johns et al. 2018 (https://doi.org/10.1038/nmeth.4633) where they study promoter sequences outside of E. coli

      Line 90 - replace 'hypothesis-free' with unbiased

      We have implemented these changes.

      Line 102 - state % of DEGs relative to the entire pan-genome

      Given that the study is focused on identifying variants that were associated with changes in expression for reference genes (i.e. those present in the reference genome), we think that providing this percentage would give the false impression that our analysis include accessory genes that are not encoded by the reference isolate, which is not what we have done.

      Figure 1A is not discussed in the text

      We have added an explicit mention of the panels in the relevant section of the results.

      Line 111: it is unclear what enrichment was being compared between, FIgures 1C/D have 'Gene counts' but is of the total DEGs? How is the p-value derived? Comparing and what statistical test was performed? Comparing DEG enrichment vs the pangenome? K12 genome?

      We have amended the results and methods section, as well as Figure 1’s caption to provide more details on how this analysis was carried out.

      Line 122-123: State what letters correspond to these COG categories here

      We have implemented the clarifications and edits suggested above

      Line 155: Need to clarify how you use k-mers in this and how they are different than SNPs. are you looking at k-mer content of these regions? K-mers up to hexamers or what? How are these compared. You can't just say we used k-mers.

      We have amended that line in the results section to more explicitly refer to the actual encoding of the k-mer variants, which were presence/absence patterns for k-mers extracted from each target gene’s promoter region separately, using our own developed method, called panfeed. We note that more details were already given in the methods section, but we do recognize that it’s better to clarify things in the results section, so that more distracted readers get the proper information about this class of genetic variants.

      Line 172: It would be VERY helpful to have a supplementary figure describing these types of variants, perhaps a multiple-sequence alignment containing each example

      We thank the reviewer for this suggestion. We have now added Supplementary Figure 3, which shows the sequence alignments of the cis-regulatory regions underlying each class of the genetic marker for both E. coli and P. aeruginosa.

      Figure 4: THis figure is too small. Why are WaaQ and UlaE being used as examples here when you are supposed to be explicitly showing variants with strong positive correlations?

      We rearranged the figure’s layout to improve its readability. We agree that the correlation for waaQ and ulaE is weaker than for yfgJ and kgtP, but our intention was to not simply cherry-pick strong examples, but also those for which the link between predicted promoter strength and recorded gene expression was less obvious.

      Figure 4: Why is there variation between variants present and variant absent? Is this due to other changes in the variant? Should mention this in the text somewhere

      Variability in the predicted transcription rate for isolates encoding for the same variant is due to the presence of other (different) variants in the region surrounding the target variant. PromoterCalculator uses nucleotide regions of variable length (78 to 83bp) to make its predictions, while the variants we are focusing on are typically shorter (as shown in Figure 4). This results in other variants being included in the calculation and therefore slightly different predicted transcription rates for each strain. We have amended the caption of Figure 4 to provide a succinct explanation of these differences.

      Line 359: Need to talk about each supplementary figure 4 to 9 and how they demonstrate your point.

      We have expanded this section to more explicitly mention the contents of these supplementary figures and why they are relevant for the findings of this section (L425).

      Are the text and figures clear and accurate?

      Figure 4 too small

      We have fixed the figure, as described above

      Acronyms are defined multiple times in the manuscript, sometimes not the first time they are used (e.g. SNP, InDel)

      Figure 8A - Remove red box, increase label size

      Figure 8B - Low resolution, grey text is unreadable and should be darker and higher resolution

      Line 35 - be more specific about types of carbon metabolism and catabolite repression

      Line 67 - include citation for ireland et al. 2020 https://doi.org/10.7554/eLife.55308

      Line 74 - You talk about looking in cis but don't specify how mar away cis is

      Line 75 - we encoded genetic variants..... It is unclear what you mean here

      Line 104 - 'were apart of operons' should clarify you mean polycistronic or multi-gene operons. Single genes may be considered operonic units as well.

      We have addressed all the issues indicated above.

      Figure 2: THere is no axis for the percents and the percents don't make sense relative to the bars they represent??

      We realize that this visualization might not have been the most clear for readers, and have made the following improvement: we have added the number of genes with at least one association before the percentage. We note that the x-axis is in log scale, which may make it seem like the light-colored bars are off. With the addition of the actual number of associated genes we think that this confusion has been removed.

      Figure 2: Figure 2B legend should clarify that these are individual examples of Differential expression between variants

      Line 198-199: This sentence doesn't make sense, 'encoded using kmers' is not descriptive enough

      Line 205: Should be upfront about that you're using the Promoter Calculator that models biophysical properties of promoter sequences to predict activity.

      Line 251: 'Scanned the non-coding sequences of the DEGs'. This is far too vague of a description of an approach. Need to clarify how you did this and I didn't see in the method. Is this an HMM? Perfect sequence match to consensus sequence? Some type of alignment?

      Line 257-259: This sentence lacks clarity

      We have implemented all the suggested changes and clarified the points that the reviewer has highlighted above.

      Line346: How were the E. coli isolates tested? Was this an experiment you did? This is a massive undertaking (1600 isolates * 12 conditions) if so so should be clearly defined

      While we have indicated in the previous paragraph that the genomes and antimicrobial susceptibility data were obtained from previously published studies, we have now modified this paragraph (e.g. L411 and L418) slightly to make this point even clearer.

      Figure 6A: The tile plot on the right side is not clearly labeled and it is unclear what it is showing and how that relates to the bar plots.

      In the revised figure, we have clarified the labeling of the heatmap to now read “Log2(Fold Change) (measured expression)” to indicate that it represents each gene’s fold changes obtained from our initial transcriptomic analysis. We have also included this information in the caption of the figure, making the relationship between the measured gene expression (heatmap) and the reporter assay data (bar plots) clear to the reader.

      FIgure 6B: typo in legend 'Downreglation'

      We thank the review for pointing this out. The typo has been corrected to “Down regulation” in the revised figure.

      Line 398: Need to state rationale for why Waaq operon is being investigated here. WHy did you look into individual example?

      We thank the reviewer for asking for a clarification here. Our decision to investigate the waaQ gene was one of both biological relevance and empirical evidence. In our analysis associating non-coding variants with antimicrobial resistance using the Moradigaravand et al. dataset, we identified a T>C variant at position 3808241 that was associated with resistance to Tobramycin. We also observed this variant in our strain collection, where it was associated with expression changes of the gene, suggesting a possible functional impact. The waa operon is involved in LPS synthesis, a central determinant of the bacteria’s outer membrane integrity and a well established virulence factor. This provided a plausible biological mechanism through which variation could influence antimicrobial susceptibility. As its role in resistance has not been extensively characterized, this represents a good candidate for our experimental validation. We have now included this rationale in our revised manuscript (i.e. L476).

      Figure 8: Can get rid of red box

      We have now removed the red box from Figure 8 in the revised version.

      Line 463 - 'account for all kinds' is too informal

      Mix of font styles throughout document

      We have implemented all the suggestions and formatting changes indicated above.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In their manuscript "Cis non-coding genetic variation drives gene expression changes in the E. coli and P. aeruginosa pangenomes", Damaris and co-authors present an extensive meta-analysis, plus some useful follow up experiments, attempting to apply GWAS principles to identify the extent to which differences in gene expression between different strains within a given species can be directly assigned to cis-regulatory mutations. The overall principle, and the question raised by the study, is one of substantial interest, and the manuscript here represents a careful and fascinating effort at unravelling these important questions. I want to preface my review below (which may otherwise sound more harsh than I intend) with the acknowledgment that this is an EXTREMELY difficult and challenging problem that the authors are approaching, and they have clearly put in a substantial amount of high quality work in their efforts to address it. I applaud the work done here, I think it presents some very interesting findings, and I acknowledge fully that there is no one perfect approach to addressing these challenges, and while I will object to some of the decisions made by the authors below, I readily admit that others might challenge my own suggestions and approaches here. With that said, however, there is one fundamental decision that the authors made which I simply cannot agree with, and which in my view undermines much of the analysis and utility of the study: that decision is to treat both gene expression and the identification of cis-regulatory regions at the level of individual genes, rather than transcriptional units. Below I will expand on why I find this problematic, how it might be addressed, and what other areas for improvement I see in the manuscript:

      We thank the reviewer for their praise of our work. A careful set of replies to the major and minor critiques are reported below each point.

      In the entire discussion from lines roughly 100-130, the authors frequently dissect out apparently differentially expressed genes from non differentially expressed genes within the same operons... I honestly wonder whether this is a useful distinction. I understand that by the criteria set forth by the authors it is technically correct, and yet, I wonder if this is more due to thresholding artifacts (i.e., some genes passing the authors' reasonable-yet-arbitrary thresholds whereas others in the same operon do not), and in the process causing a distraction from an operon that is in fact largely moving in the same direction. The authors might wish to either aggregate data in some way across known transcriptional units for the purposes of their analysis, and/or consider a more lenient 'rescue' set of significance thresholds for genes that are in the same operons as differentially expressed genes. I would favor the former approach, performing virtually all of their analysis at the level of transcriptional units rather than individual genes, as much of their analysis in any case relies upon proper assignment of genes to promoters, and this way they could focus on the most important signals rather than get lots sometimes in the weeds of looking at every single gene when really what they seem to be looking at in this paper is a property OF THE PROMOTERS, not the genes. (of course there are phenomena, such as rho dependent termination specifically titrating expression of late genes in operons, but I think on the balance the operon-level analysis might provide more insights and a cleaner analysis and discussion).

      We agree with the reviewer that the peculiar nature of transcription in bacteria has to be taken into account in order to properly quantify the influence of cis variants in gene expression changes. We therefore added the exact analysis the reviewer suggested; that is, we ran associations between the variants in cis to the first gene of each operon and a phenotype that considered the fold-change of all genes in the operon, via a weighted average (see Methods for more details). As reported in the results section (L223), we found a similar trend as with the original analysis: we found the highest proportion of associations when encoding cis variants using k-mers (42% for E. coli and 45% for P. aeruginosa). More importantly, we found a high degree of overlap between this new “operon-level” association analysis and the original one (only including the first gene in each operon). We found a range of 90%-94% of associations overlapping for E. coli and between 75% and 91% for P. aeruginosa, depending on the variant type. We note that operon definitions are less precise for P. aeruginosa, which might explain the higher variability in the level of overlap. We have added the results of this analysis in the results section.

      This also leads to a more general point, however, which I think is potentially more deeply problematic. At the end of the day, all of the analysis being done here centers on the cis regulatory logic upstream of each individual open reading frame, even though in many cases (i.e., genes after the first one in multi-gene operons), this is not where the relevant promoter is. This problem, in turn, raises potentially misattributions of causality running in both directions, where the causal impact on a bona fide promoter mutation on many genes in an operon may only be associated with the first gene, or on the other side, where a mutation that co-occurs with, but is causally independent from, an actual promoter mutation may be flagged as the one driving an expression change. This becomes an especially serious issue in cases like ulaE, for genes that are not the first gene in an operon (at least according to standard annotations, the UlaE transcript should be part of a polycistronic mRNA beginning from the ulaA promoter, and the role played by cis-regulatory logic immediately upstream of ulaE is uncertain and certainly merits deeper consideration. I suspect that many other similar cases likewise lurk in the dataset used here (perhaps even moreso for the Pseudomonas data, where the operon definitions are likely less robust). Of course there are many possible explanations, such as a separate ulaE promoter only in some strains, but this should perhaps be carefully stated and explored, and seems likely to be the exception rather than the rule.

      While we again agree with the reviewer that some of our associations might not result in a direct causal link because the focal variant may not belong to an actual promoter element, we also want to point out how the ability to identify the composition of transcriptional units in bacteria is far from a solved problem (see references at the bottom of this comment, two in general terms, and one characterizing a specific example), even for a well-studied species such as E. coli. Therefore, even if carrying out associations at the operon level (e.g. by focusing exclusively on variants in cis for the first gene in the operon) might be theoretically correct, a number of the associations we find further down the putative operons might be the result of a true biological signal.

      1. Conway, T., Creecy, J. P., Maddox, S. M., Grissom, J. E., Conkle, T. L., Shadid, T. M., Teramoto, J., San Miguel, P., Shimada, T., Ishihama, A., Mori, H., & Wanner, B. L. (2014). Unprecedented High-Resolution View of Bacterial Operon Architecture Revealed by RNA Sequencing. mBio, 5(4), 10.1128/mbio.01442-14. https://doi.org/10.1128/mbio.01442-14

      2. Sáenz-Lahoya, S., Bitarte, N., García, B., Burgui, S., Vergara-Irigaray, M., Valle, J., Solano, C., Toledo-Arana, A., & Lasa, I. (2019). Noncontiguous operon is a genetic organization for coordinating bacterial gene expression. Proceedings of the National Academy of Sciences, 116(5), 1733–1738. https://doi.org/10.1073/pnas.1812746116

      3. Zehentner, B., Scherer, S., & Neuhaus, K. (2023). Non-canonical transcriptional start sites in E. coli O157:H7 EDL933 are regulated and appear in surprisingly high numbers. BMC Microbiology, 23(1), 243. https://doi.org/10.1186/s12866-023-02988-6

      Another issue with the current definition of regulatory regions, which should perhaps also be accounted for, is that it is likely that for many operons, the 'regulatory regions' of one gene might overlap the ORF of the previous gene, and in some cases actual coding mutations in an upstream gene may contaminate the set of potential regulatory mutations identified in this dataset.

      We agree that defining regulatory regions might be challenging, and that those regions might overlap with coding regions, either for the focal gene or the one immediately upstream. For these reasons we have defined a wide region to identify putative regulatory variants (-200 to +30 bp around the start codon of the focal gene). We believe this relatively wide region allows us to capture the most cis genetic variation.

      Taken together, I feel that all of the above concerns need to be addressed in some way. At the absolute barest minimum, the authors need to acknowledge the weaknesses that I have pointed out in the definition of cis-regulatory logic at a gene level. I think it would be far BETTER if they performed a re-analysis at the level of transcriptional units, which I think might substantially strengthen the work as a whole, but I recognize that this would also constitute a substantial amount of additional effort.

      As indicated above, we have added a section in the results section to report on the analysis carried out at the level of operons as individual units, with more details provided in the methods section. We believe these results, which largely overlap with the original analysis, are a good way to recognize the limitation of our approach and to acknowledge the importance of gaining a better knowledge on the number and composition of transcriptional units in bacteria, for which, as the reference above indicates, we still have an incomplete understanding.

      Having reached the end of the paper, and considering the evidence and arguments of the authors in their totality, I find myself wondering how much local x background interactions - that is, the effects of cis regulatory mutations (like those being considered here, with or without the modified definitions that I proposed above) IN THE CONTEXT OF A PARTICULAR STRAIN BACKGROUND, might matter more than the effects of the cis regulatory mutations per se. This is a particularly tricky problem to address because it would require a moderate number of targeted experiments with a moderate number of promoters in a moderate number of strains (which of course makes it maximally annoying since one can't simply scale up hugely on either axis individually and really expect to tease things out). I think that trying to address this question experimentally is FAR beyond the scope of the current paper, but I think perhaps the authors could at least begin to address it by acknowledging it as a challenge in their discussion section, and possibly even identify candidate promoters that might show the largest divergence of activities across strains when there IS no detectable cis regulatory mutation (which might be indicative of local x background interactions), or those with the largest divergences of effect for a given mutation across strains. A differential expression model incorporating shrinkage is essential in such analysis to avoid putting too much weight on low expression genes with a lot of Poisson noise.

      We again thank the reviewer for their thoughtful comments on the limitations of correlative studies in general, and microbial GWAS in particular. In regards to microbial GWAS we feel we may have failed to properly explain how the implementation we have used allows to, at least partially, correct for population structure effects. That is, the linear mixed model we have used relies on population structure to remove the part of the association signal that is due to the genetic background and thus focus the analysis on the specific loci. Obviously examples in which strong epistatic interactions are present would not be accounted for, but those would be extremely challenging to measure or predict at scale, as the reviewer rightfully suggests. We have added a brief recap of the ability of microbial GWAS to account for population structure in the results section (“A large fraction of gene expression changes can be attributed to genetic variations in cis regulatory regions”, e.g. L195).

      I also have some more minor concerns and suggestions, which I outline below:

      It seems that the differential expression analysis treats the lab reference strains as the 'centerpoint' against which everything else is compared, and yet I wonder if this is the best approach... it might be interesting to see how the results differ if the authors instead take a more 'average' strain (either chosen based on genetics or transcriptomics) as a reference and compared everything else to that.

      While we don’t necessarily disagree with the reviewer that a “wild” strain would be better to compare against, we think that our choice to go for the reference isolates is still justified on two grounds. First, while it is true that comparing against a reference introduces biases in the analysis, this concern would not be removed had we chosen another strain as reference; which strain would then be best as a reference to compare against? We think that the second point provides an answer to this question; the “traditional” reference isolates have a rich ecosystem of annotations, experimental data, and computational predictions. These can in turn be used for validation and hypothesis generation, which we have done extensively in the manuscript. Had we chosen a different reference isolate we would have had to still map associations to the traditional reference, resulting in a probable reduction in precision. An example that will likely resonate with this reviewer is that we have used experimentally-validated and high quality computational operon predictions to look into likely associations between cis-variants and “operon DEGs”. This analysis would have likely been of worse quality had we used another strain as reference, for which operon definitions would have had to come from lower-quality predictions or be “lifted” from the traditional reference.

      Line 104 - the statement about the differentially expressed genes being "part of operons with diverse biological functions" seems unclear - it is not apparent whether the authors are referring to diversity of function within each operon, or between the different operons, and in any case one should consider whether the observation reflects any useful information or is just an apparently random collection of operons.

      We agree that this formulation could create confusion and we have elected to remove the expression “with diverse biological functions”, given that we discuss those functions immediately after that sentence.

      Line 292 - I find the argument here somewhat unconvincing, for two reasons. First, the fact that only half of the observed changes went in the same direction as the GWAS results would indicate, which is trivially a result that would be expected by random chance, does not lend much confidence to the overall premise of the study that there are meaningful cis regulatory changes being detected (in fact, it seems to argue that the background in which a variant occurs may matter a great deal, at least as much as the cis regulatory logic itself). Second, in order to even assess whether the GWAS is useful to "find the genetic determinants of gene expression changes" as the authors indicate, it would be necessary to compare to a reasonable, non-straw-man, null approach simply identifying common sequence variants that are predicted to cause major changes in sigma 70 binding at known promoters; such a test would be especially important given the lack of directional accuracy observed here. Along these same lines, it is perhaps worth noting, in the discussion beginning on line 329, that the comparison is perhaps biased in favor of the GWAS study, since the validation targets here were prioritized based on (presumably strong) GWAS data.

      We thank the reviewer for prompting us into reasoning about the results of the in-vitro validation experiments. We agree that the agreement between the measured gene expression changes agree only partly with those measured with the reporter system, and that this discrepancy could likely be attributed to regulatory elements that are not in cis, and thus that were not present in the in-vitro reporter system. We have noted this possibility in the discussion. Additionally, we have amended the results section to note that even though the prediction in the direction of gene expression change was not as accurate as it could be expected, the prediction of whether a change would be present (thus ignoring directionality) was much higher.

      I don't find the Venn diagrams in Fig 7C-D useful or clear given the large number of zero-overlap regions, and would strongly advocate that the authors find another way to show these data.

      While we are aware that alternative ways to show overlap between sets, such as upset plots, we don’t actually find them that much easier to parse. We actually think that the simple and direct Venn diagrams we have drawn convey the clear message that overlaps only exist between certain drug classes in E. coli, and virtually none for P. aeruginosa. We have added a comment on the lack of overlap between all drug classes and the differences between the two species in the results section (i.e. L436 and L465).

      In the analysis of waa operon gene expression beginning on line 400, it is perhaps important to note that most of the waa operon doesn't do anything in laboratory K12 strains due to the lack of complete O-antigen... the same is not true, however, for many wild/clinical isolates. It would be interesting to see how those results compare, and also how the absolute TPMs (rather than just LFCs) of genes in this operon vary across the strains being investigated during TOB treatment.

      We thank the reviewer for this helpful suggestion. We examined the absolute expression (TPMs) of waa operon genes under the baseline (A) and following exposure to Tobramycin (B). The representative TPMs per strain were obtained by averaging across biological replicates. We observed a constitutive expression of the genes in the reference strain (MG1655) and the other isolates containing the variant of interest (MC4100, BW25113). In contrast, strains lacking the variants of interest (IAI76 and IAI78), showed lower expression of these operon genes under both conditions. Strain IAI77, on the other hand, displayed increased expression of a subset of waa genes post Tobramycin exposure, indicating strain-specific variation in transcriptional response. While the reference isolate might not have the O-antigen, it certainly expresses the waa operon, both constitutively and under TOB exposure.

      I don't think that the second conclusion on lines 479-480 is fully justified by the data, given both the disparity in available annotation information between the two species, AND the fact that only two species were considered.

      While we feel that the “Discussion” section of a research paper allows for speculative statements, we have to concede that we have perhaps overreached here. We have amended this sentence to be more cautious and not mislead readers.

      Line 118: "Double of DEGs"

      Line 288 - presumably these are LOG fold changes

      Fig 6b - legend contains typos

      Line 661 - please report the read count (more relevant for RNA-seq analysis) rather than Gb

      We thank the reviewer for pointing out the need to make these edits. We have implemented them all.

      Source code - I appreciate that the authors provide their source code on github, but it is very poorly documented - both a license and some top-level documentation about which code goes with each major operation/conclusion/figure should be provided. Also, ipython notebooks are in general a poor way in my view to distribute code, due to their encouragement of nonlinear development practices; while they are fine for software development, actual complete python programs along with accompanying source data would be preferrable.

      We agree with the reviewer that a software license and some documentation about what each notebook is about is warranted, and we have added them both. While we agree that for “consumer-grade” software jupyter notebooks are not the most ergonomic format, we believe that as a documentation of how one-time analyses were carried out they are actually one of the best formats we could think of. They in fact allow for code and outputs to be presented alongside each other, which greatly helped us to iterate on our research and to ensure that what was presented in the manuscript matched the analyses we reported in the code. This is of course up for debate and ultimately specific to someone’s taste, and so we will keep the reviewer’s critique in mind for our next manuscript. And, if we ever decide to package the analyses presented in the manuscript as a “consumer-grade” application for others to use, we would follow higher standards of documentation and design.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, Damaris et al. collected genome sequences and transcriptomes from isolates from two bacterial species. Data for E. coli were produced for this paper, while data for P. aeruginosa had been measured earlier. The authors integrated these data to detect genes with differential expression (DE) among isolates as well as cis-expression quantitative trait loci (cis-eQTLs). The authors used sample sizes that were adequate for an initial exploration of gene regulatory variation (n=117 for E. coli and n=413 for P. aeruginosa) and were able to discover cis eQTLs at about 39% of genes. In a creative addition, the authors compared their results to transcription rates predicted from a biophysical promoter model as well as to annotated transcription factor binding sites. They also attempted to validate some of their associations experimentally using GFP-reporter assays. Finally, the paper presents a mapping of antibiotic resistance traits. Many of the detected associations for this important trait group were in non-coding genome regions, suggesting a role of regulatory variation in antibiotic resistance.

      A major strength of the paper is that it covers an impressive range of distinct analyses, some of which in two different species. Weaknesses include the fact that this breadth comes at the expense of depth and detail. Some sections are underdeveloped, not fully explained and/or thought-through enough. Important methodological details are missing, as detailed below.

      We thank the reviewer for highlighting the strengths of our study. We hope that our replies to their comments and the other two reviewers will address some of the limitations.

      Major comments:

      1. An interesting aspect of the paper is that genetic variation is represented in different ways (SNPs & indels, IRG presence/absence, and k-mers). However, it is not entirely clear how these three different encodings relate to each other. Specifically, more information should be given on these two points:

      2. it is not clear how "presence/absence of intergenic regions" are different from larger indels.

      In order to better guide readers through the different kinds of genetic variants we considered, we have added a brief explanation about what “promoter switches” are in the introduction (“meaning that the entire promoter region may differ between isolates due to recombination events”, L56). We believe this clarifies how they are very different in character from a large deletion. We have kept the reference to the original study (10.1073/pnas.1413272111) describing how widespread these switches are in E. coli as a way for readers to discover more about them.

      • I recommend providing more narration on how the k-mers compare to the more traditional genetic variants (SNPs and indels). It seems like the k-mers include the SNPs and indels somehow? More explanation would be good here, as k-mer based mapping is not usually done in other species and is not standard practice in the field. Likewise, how is multiple testing handled for association mapping with k-mers, since presumably each gene region harbors a large number of k-mers, potentially hugely increasing the multiple testing burden?

      We indeed agree with the reviewer in thinking that representing genetic variants as k-mers would encompass short variants (SNP/InDels) as well as larger variants and promoters presence/absence patterns. We believe that this assumption is validated by the fact that we identify the highest proportion of DEGs with a significant association when using this representation of variants (Figure 2A, 39% for both species). We have added a reference to a recent review on the advantages of k-mer methods for population genetics (10.1093/molbev/msaf047) in the introduction. Regarding the issue of multiple testing correction, we have employed a commonly recognized approach that, unlike a crude Bonferroni correction using the number of tested variants, allows for a realistic correction of association p-values. We used the number of unique presence/absence patterns, which can be shared between multiple genetic variants, and applied a Bonferroni correction using this number rather than the number of variants tested. We have expanded the corresponding section in the methods (e.g. L697) to better explain this point for readers not familiar with this approach.

      1. What was the distribution of association effect sizes for the three types of variants? Did IRGs have larger effects than SNPs as may be expected if they are indeed larger events that involve more DNA differences? What were their relative allele frequencies?

      We appreciate the suggestion made by the reviewer to look into the distribution of effect sizes divided by variant type. We have now evaluated the distribution of the effect sizes and allele frequencies for the genetic markers (SNPs/InDels, IGRs, and k-mers) for both species (Supplementary Figure 2). In E. coli, IGR variants showed somewhat larger median effect sizes (|β| = 4.5) than SNPs (|β| = 3.8), whereas k-mers displayed the widest distribution (median |β| = 5.2). In P. aeruginosa, the trend differed with IGRs exhibiting smaller effects (median |β| = 3.2), compared to SNPs/InDels (median |β| =5.1) and k-mers (median |β| = 6.2). With respect to allele frequencies, SNPs/InDels generally occured at lower frequencies (median AF = 0.34 for E.coli, median AF = 0.33 for P. aeruginosa), whereas IGRs (median AF = 0.65 for E. coli and 0.75 for P. aeruginosa) and k-mers (median AF = 0.71 for E. coli and 0.65 for P. aeruginosa) were more often at the intermediate to higher frequencies respectively. We have added a visualization for the distribution of effect sizes (Supplementary Figure 2).

      1. The GFP-based experiments attempting to validate the promoter effects for 18 genes are laudable, and the fact that 16 of them showed differences is nice. However, the fact that half of the validation attempts yielded effects in the opposite direction of what was expected is quite alarming. I am not sure this really "further validates" the GWAS in the way the authors state in line 292 - in fact, quite the opposite in that the validations appear random with regards to what was predicted from the computational analyses. How do the authors interpret this result? Given the higher concordance between GWAS, promoter prediction, and DE, are the GFP assays just not relevant for what is going on in the genome? If not, what are these assays missing? Overall, more interpretation of this result would be helpful.

      We thanks the reviewer for their comment, which is similar in nature to that raised by reviewer #2 above. As noted in our reply above we have amended the results and discussion to indicate that although the direction of gene expression change was not highly accurate, focusing on the magnitude (or rather whether there would be a change in gene expression, regardless of the direction), resulted in a higher accuracy. We postulate that the cases in which the direction of the change was not correctly identified could be due to the influence of other genetic elements in trans with the gene of interest.

      1. On the same note, it would be really interesting to expand the GFP experiments to promoters that did not show association in the GWAS. Based on Figure 6, effects of promoter differences on GFP reporters seem to be very common (all but three were significant). Is this a higher rate than for the average promoter with sequence variation but without detected association? A handful of extra reporter experiments might address this. My larger question here is: what is the null expectation for how much functional promoter variation there is?

      We thank the reviewer for this comment. We agree that estimating the null expectation for the functional promoter would require testing promoter alleles with sequence variation that are not associated in the GWAS. Such experiments, which would directly address if the observed effects in our study exceeds background, would have required us to prepare multiple constructs, which was unfortunately not possible for us due to staff constraints. We therefore elected to clarify the scope of our GFP reporter assays instead. These experiments were designed as a paired comparison of the wild-type and the GWAS-associated variant alleles of the same promoter in an identical reporter background, with the aim of testing allele-specific functional effects for GWAS hits (Supplementary Figure 6). We also included a comparison in GFP fluorescence between the promoterless vector (pOT2) and promoter-containing constructs; we observed higher GFP signals in all but four (yfgJ, fimI, agaI, and yfdQ) variant-containing promoter constructs, which indicates that for most of the construct we cloned active promoter elements. We have revised the manuscript text accordingly to reflect this clarification and included the control in the supplementary information as Supplementary Figure 6.

      1. Were the fold-changes in the GFP experiments statistically significant? Based on Figure 6 it certainly looks like they are, but this should be spelled out, along with the test used.

      We thank the reviewer for pointing this out. We have reviewed Figure 6 to indicate significant differences between the test and control reporter constructs. We used the paired student’s t-test to match the matched plate/time point measurements. We also corrected for multiple testing using the Benhamini-Hochberg correction. As seen in the updated Figure 6A, 16 out of the 18 reporter constructs displayed significant differences (adjusted p-value

      1. What was the overall correlation between GWAS-based fold changes and those from the GFP-based validation? What does Figure 6A look like as a scatter plot comparing these two sets of values?

      We thank the reviewer for this helpful suggestion, which allows us to more closely look into the results of our in-vitro validation. We performed a direct comparison of RNAseq fold changes from the GWAS (x-axis) with the GFP reporter measurements (y-axis) as depicted in the figure above. The overall correlation between the two was weak (Pearson r = 0.17), reflecting the lack of thorough agreement between the associations and the reporter construct. We however note that the two metrics are not directly comparable in our opinion, since on the x-axis we are measuring changes in gene expression and on the y-axis changes in fluorescence expression, which is downstream from it. As mentioned above and in reply to a comment from reviewer 2, the agreement between measured gene expression and all other in-silico and in-vitro techniques increases when ignoring the direction of the change. Overall, we believe that these results partly validate our associations and predictions, while indicating that other factors in trans with the regulatory region contribute to changes in gene expression, which is to be expected. The scatter plot has been included as a new supplementary figure (Supplementary Figure 7).

      1. Was the SNP analyzed in the last Results section significant in the gene expression GWAS? Did the DE results reported in this final section correspond to that GWAS in some way?

      The T>C SNP upstream of waaQ did not show significant association with gene expression in our cis GWAS analysis. Instead, this variant was associated with resistance to tobramycin when referencing data from Danesh et al, and we observed the variant in our strain collection. We subsequently investigated whether this variant also influenced expression of the waa operon under sub-inhibitory tobramycin exposure. The differential expression results shown in the final section therefore represent a functional follow-up experiment, and not a direct replication of the GWAS presented in the first part of the manuscript.

      1. Line 470: "Consistent with the differences in the genetic structure of the two species" It is not clear what differences in genetic structure this refers to. Population structure? Genome architecture? Differences in the biology of regulatory regions?

      The awkwardness of that sentence is perhaps the consequence of our assumption that readers would be aware of the differences in population genetics differences between the two species. We however have realized that not much literature is available (if at all!) about these differences, which we have observed during the course of this and other studies we have carried out. As a result, we agree that we cannot assume that the reader is similarly familiar with these differences, and have changed that sentence (i.e. L548) to more directly address the differences between the two species, which will presumably result in a diverse population structure. We thank the reviewer for letting us be aware of a gap in the literature concerning the comparison of pangenome structures across relevant species.

      1. Line 480: the reference to "adaption" is not warranted, as the paper contains no analyses of evolutionary patterns or processes. Genetic variation is not the same as adaptation.

      We have amended this sentence to be more adherent to what we can conclude from our analyses.

      1. There is insufficient information on how the E. coli RNA-seq data was generated. How was RNA extracted? Which QC was done on the RNA; what was its quality? Which library kits were used? Which sequencing technology? How many reads? What QC was done on the RNA-seq data? For this section, the Methods are seriously deficient in their current form and need to be greatly expanded.

      We thank the reviewer for highlighting the need for clearer methodological detail. We have expanded this section (i.e. L608) to fully describe the generation and quality control of the E. coli RNA-seq data including RNA extraction and sequencing platform.

      1. How were the DEG p-values adjusted for multiple testing?

      As indicated in the methods section (“Differential gene expression and functional enrichment analysis”), we have used DEseq2 for E. coli, and LPEseq for P. aeruginosa. Both methods use the statistical framework of the False Discovery Rate (FDR) to compute an adjusted p-value for each gene. We have added a brief mention of us following the standard practice indicated by both software packages in the methods.

      1. Were there replicates for the E. coli strains? The methods do not say, but there is a hint there might have been replicates given their absence was noted for the other species.

      In the context of providing more information about the transcriptomics experiments for E. coli, we have also more clearly indicated that we have two biological replicates for the E. coli dataset.

      1. There needs to be more information on the "pattern-based method" that was used to correct the GWAS for multiple tests. How does this method work? What genome-wide threshold did it end up producing? Was there adjustment for the number of genes tested in addition to the number of variants? Was the correction done per variant class or across all variant classes?

      In line with an earlier comment from this reviewer, we have expanded the section in the Methods (e.g. L689) that explains how this correction worked to include as many details as possible, in order to provide the readers with the full context under which our analyses were carried out.

      1. For a paper that, at its core, performs a cis-eQTL mapping, it is an oversight that there seems not to be a single reference to the rich literature in this space, comprising hundreds of papers, in other species ranging from humans, many other animals, to yeast and plants.

      We thank both reviewer #1 and #3 for pointing out this lack of references to the extensive literature on the subject. We have added a number of references about the applications of eQTL studies, and specifically its application in microbial pangenomes, which we believe is more relevant to our study, in the introduction.

      Minor comments:

      1. I wasn't able to understand the top panels in Figure 4. For ulaE, most strains have the solid colors, and the corresponding bottom panel shows mostly red points. But for waaQ, most strains have solid color in the top panel, but only a few strains in the bottom panel are red. So solid color in the top does not indicate a variant allele? And why are there so many solid alleles; are these all indels? Even if so, for kgtP, the same colors (i.e., nucleotides) seem to seamlessly continue into the bottom, pale part of the top panel. How are these strains different genotypically? Are these blocks of solid color counted as one indel or several SNPs, or somehow as k-mer differences? As the authors can see, these figures are really hard to understand and should be reworked. The same comment applies to Figure 5, where it seems that all (!) strains have the "variant"?

      We thank the reviewer for pointing out some limitations with our visualizations, most importantly with the way we explained how to read those two figures. We have amended the captions to more explicitly explain what is shown. The solid colors in the “sequence pseudo-alignment” panels indicate the focal cis variant, which is indicated in red in the corresponding “predicted transcription rate” panels below. In the case of Figure 5, the solid color indicates instead the position of the TFBS in the reference.

      1. Figure 1A & B: It would be helpful to add the total number of analyzed genes somewhere so that the numbers denoted in the colored outer rings can be interpreted in comparison to the total.

      We have added the total number of genes being considered for either species in the legend.

      1. Figure 1C & D: It would be better to spell out the COG names in the figure, as it is cumbersome for the reader to have to look up what the letters stand for in a supplementary table in a separate file.

      While we do not disagree with the awkwardness of having to move to a supplementary table to identify the full name of a COG category, we also would like to point out that the very long names of each category would clutter the figure to a degree that would make it difficult to read. We had indeed attempted something similar to what the reviewer suggests in early drafts of this manuscript, leading to small and hard to read labels. We have therefore left the full names of each COG category in Supplementary Table 3.

      1. Line 107: "Similarly," does not fit here as the following example (with one differentially expressed gene in an operon) is conceptually different from the one before, where all genes in the operon were differentially expressed.

      We agree and have amended the sentence accordingly.

      1. Figure 5 bottom panel: it is odd that on the left the swarm plots (i.e., the dots) are on the inside of the boxplots while on the right they are on the outside.

      We have fixed the position of the dots so that they are centered with respect to the underlying boxplots.

      1. It is not clear to me how only one or a few genes in an operon can show differential mRNA abundance. Aren't all genes in an operon encoded by the same mRNA? If so, shouldn't this mRNA be up- or downregulated in the same manner for all genes it encodes? As I am not closely familiar with bacterial systems, it is well possible that I am missing some critical fact about bacterial gene expression here. If this is not an analysis artifact, the authors could briefly explain how this observation is possible.

      We thanks the reviewer for their comment, which again echoes one of the main concerns from reviewer #2. As noted in our reply above, it has been established in multiple studies (see the three we have indicated above in our reply to reviewer #2) how bacteria encode for multiple “non-canonical” transcriptional units (i.e. operons), due to the presence of accessory terminators and promoters. This, together with other biological effects such as the presence of mRNA molecules of different lengths due to active transcription and degradation and technical noise induced by RNA isolation and sequencing can result in variability in the estimation of abundance for each gene.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In their manuscript "Cis non-coding genetic variation drives gene expression changes in the E. coli and P. aeruginosa pangenomes", Damaris and co-authors present an extensive meta-analysis, plus some useful follow up experiments, attempting to apply GWAS principles to identify the extent to which differences in gene expression between different strains within a given species can be directly assigned to cis-regulatory mutations. The overall principle, and the question raised by the study, is one of substantial interest, and the manuscript here represents a careful and fascinating effort at unravelling these important questions. I want to preface my review below (which may otherwise sound more harsh than I intend) with the acknowledgment that this is an EXTREMELY difficult and challenging problem that the authors are approaching, and they have clearly put in a substantial amount of high quality work in their efforts to address it. I applaud the work done here, I think it presents some very interesting findings, and I acknowledge fully that there is no one perfect approach to addressing these challenges, and while I will object to some of the decisions made by the authors below, I readily admit that others might challenge my own suggestions and approaches here. With that said, however, there is one fundamental decision that the authors made which I simply cannot agree with, and which in my view undermines much of the analysis and utility of the study: that decision is to treat both gene expression and the identification of cis-regulatory regions at the level of individual genes, rather than transcriptional units. Below I will expand on why I find this problematic, how it might be addressed, and what other areas for improvement I see in the manuscript:

      In the entire discussion from lines roughly 100-130, the authors frequently dissect out apparently differentially expressed genes from non differentially expressed genes within the same operons... I honestly wonder whether this is a useful distinction. I understand that by the criteria set forth by the authors it is technically correct, and yet, I wonder if this is more due to thresholding artifacts (i.e., some genes passing the authors' reasonable-yet-arbitrary thresholds whereas others in the same operon do not), and in the process causing a distraction from an operon that is in fact largely moving in the same direction. The authors might wish to either aggregate data in some way across known transcriptional units for the purposes of their analysis, and/or consider a more lenient 'rescue' set of significance thresholds for genes that are in the same operons as differentially expressed genes. I would favor the former approach, performing virtually all of their analysis at the level of transcriptional units rather than individual genes, as much of their analysis in any case relies upon proper assignment of genes to promoters, and this way they could focus on the most important signals rather than get lots sometimes in the weeds of looking at every single gene when really what they seem to be looking at in this paper is a property OF THE PROMOTERS, not the genes. (of course there are phenomena, such as rho dependent termination specifically titrating expression of late genes in operons, but I think on the balance the operon-level analysis might provide more insights and a cleaner analysis and discussion).

      This also leads to a more general point, however, which I think is potentially more deeply problematic. At the end of the day, all of the analysis being done here centers on the cis regulatory logic upstream of each individual open reading frame, even though in many cases (i.e., genes after the first one in multi-gene operons), this is not where the relevant promoter is. This problem, in turn, raises potentially misattributions of causality running in both directions, where the causal impact on a bona fide promoter mutation on many genes in an operon may only be associated with the first gene, or on the other side, where a mutation that co-occurs with, but is causally independent from, an actual promoter mutation may be flagged as the one driving an expression change. This becomes an especially serious issue in cases like ulaE, for genes that are not the first gene in an operon (at least according to standard annotations, the UlaE transcript should be part of a polycistronic mRNA beginning from the ulaA promoter, and the role played by cis-regulatory logic immediately upstream of ulaE is uncertain and certainly merits deeper consideration. I suspect that many other similar cases likewise lurk in the dataset used here (perhaps even moreso for the Pseudomonas data, where the operon definitions are likely less robust). Of course there are many possible explanations, such as a separate ulaE promoter only in some strains, but this should perhaps be carefully stated and explored, and seems likely to be the exception rather than the rule. Another issue with the current definition of regulatory regions, which should perhaps also be accounted for, is that it is likely that for many operons, the 'regulatory regions' of one gene might overlap the ORF of the previous gene, and in some cases actual coding mutations in an upstream gene may contaminate the set of potential regulatory mutations identified in this dataset. Taken together, I feel that all of the above concerns need to be addressed in some way. At the absolute barest minimum, the authors need to acknowledge the weaknesses that I have pointed out in the definition of cis-regulatory logic at a gene level. I think it would be far BETTER if they performed a re-analysis at the level of transcriptional units, which I think might substantially strengthen the work as a whole, but I recognize that this would also constitute a substantial amount of additional effort. Having reached the end of the paper, and considering the evidence and arguments of the authors in their totality, I find myself wondering how much local x background interactions - that is, the effects of cis regulatory mutations (like those being considered here, with or without the modified definitions that I proposed above) IN THE CONTEXT OF A PARTICULAR STRAIN BACKGROUND, might matter more than the effects of the cis regulatory mutations per se. This is a particularly tricky problem to address because it would require a moderate number of targeted experiments with a moderate number of promoters in a moderate number of strains (which of course makes it maximally annoying since one can't simply scale up hugely on either axis individually and really expect to tease things out). I think that trying to address this question experimentally is FAR beyond the scope of the current paper, but I think perhaps the authors could at least begin to address it by acknowledging it as a challenge in their discussion section, and possibly even identify candidate promoters that might show the largest divergence of activities across strains when there IS no detectable cis regulatory mutation (which might be indicative of local x background interactions), or those with the largest divergences of effect for a given mutation across strains. A differential expression model incorporating shrinkage is essential in such analysis to avoid putting too much weight on low expression genes with a lot of Poisson noise.

      I also have some more minor concerns and suggestions, which I outline below: It seems that the differential expression analysis treats the lab reference strains as the 'centerpoint' against which everything else is compared, and yet I wonder if this is the best approach... it might be interesting to see how the results differ if the authors instead take a more 'average' strain (either chosen based on genetics or transcriptomics) as a reference and compared everything else to that.

      Line 104 - the statement about the differentially expressed genes being "part of operons with diverse biological functions" seems unclear - it is not apparent whether the authors are referring to diversity of function within each operon, or between the different operons, and in any case one should consider whether the observation reflects any useful information or is just an apparently random collection of operons. Line 292 - I find the argument here somewhat unconvincing, for two reasons. First, the fact that only half of the observed changes went in the same direction as the GWAS results would indicate, which is trivially a result that would be expected by random chance, does not lend much confidence to the overall premise of the study that there are meaningful cis regulatory changes being detected (in fact, it seems to argue that the background in which a variant occurs may matter a great deal, at least as much as the cis regulatory logic itself). Second, in order to even assess whether the GWAS is useful to "find the genetic determinants of gene expression changes" as the authors indicate, it would be necessary to compare to a reasonable, non-straw-man, null approach simply identifying common sequence variants that are predicted to cause major changes in sigma 70 binding at known promoters; such a test would be especially important given the lack of directional accuracy observed here. Along these same lines, it is perhaps worth noting, in the discussion beginning on line 329, that the comparison is perhaps biased in favor of the GWAS study, since the validation targets here were prioritized based on (presumably strong) GWAS data.

      I don't find the Venn diagrams in Fig 7C-D useful or clear given the large number of zero-overlap regions, and would strongly advocate that the authors find another way to show these data.

      In the analysis of waa operon gene expression beginning on line 400, it is perhaps important to note that most of the waa operon doesn't do anything in laboratory K12 strains due to the lack of complete O-antigen... the same is not true, however, for many wild/clinical isolates. It would be interesting to see how those results compare, and also how the absolute TPMs (rather than just LFCs) of genes in this operon vary across the strains being investigated during TOB treatment.

      I don't think that the second conclusion on lines 479-480 is fully justified by the data, given both the disparity in available annotation information between the two species, AND the fact that only two species were considered.

      Line 118: "Double of DEGs"

      Line 288 - presumably these are LOG fold changes

      Fig 6b - legend contains typos

      Line 661 - please report the read count (more relevant for RNA-seq analysis) rather than Gb

      Source code - I appreciate that the authors provide their source code on github, but it is very poorly documented - both a license and some top-level documentation about which code goes with each major operation/conclusion/figure should be provided. Also, ipython notebooks are in general a poor way in my view to distribute code, due to their encouragement of nonlinear development practices; while they are fine for software development, actual complete python programs along with accompanying source data would be preferrable.

      Significance

      Overall the key strength of the study is the heroic merging of large genetic and transcriptomic datasets to address the question of how much variation in gene expression can be assigned to cis regulatory mutations in E. coli and in P. aeruginosa. The authors find that only a minority of genes can have such an assignment explaining expression variation, which highlights both the many factors (local and global) impacting gene expression, and the difficulty in trying to predict and understand expression patterns in different strains. I believe that with suitable modification, the manuscript will be of great interest to a broad audience interested in bacterial genomics, gene regulation, and systems/synthetic biology.

      Reviewer Expertise: I consider myself a bacterial systems biologist and routinely use high throughput experiments to understand bacterial gene regulation.

    1. La coéducation en éducation prioritaire : enjeux, constats et perspectives

      Résumé exécutif

      Ce document synthétise l'intervention de Pierre Périer, sociologue et professeur en sciences de l'éducation, concernant la coéducation, particulièrement dans les quartiers populaires et l'éducation prioritaire.

      L'analyse met en lumière un double renversement historique : le passage d'une école républicaine construite à distance des familles vers une norme de proximité, et le transfert de la responsabilité de la « fabrication de l'élève » de l'institution vers la famille.

      Malgré l'inscription de la coéducation dans la loi de 2013, le concept reste flou pour les acteurs. Un paradoxe majeur subsiste : les parents des élèves les plus en difficulté sont souvent les moins associés au système scolaire.

      L'enjeu actuel n'est pas seulement de traiter l'éloignement des parents, mais de comprendre comment le fonctionnement institutionnel et les normes implicites de l'école contribuent à les exclure.

      Pour y remédier, Périer propose une refonte de la relation basée sur quatre principes : reconnaissance, autorisation, explicitation et diversification.

      --------------------------------------------------------------------------------

      1. Contextualisation et évolutions historiques

      La relation entre l'école et les familles a subi des transformations structurelles profondes. Pierre Périer identifie deux mouvements majeurs :

      De la distance à la proximité : Historiquement, l'école s'est bâtie à distance des parents pour protéger l'espace républicain.

      Aujourd'hui, le paradigme s'est inversé pour devenir une norme de rapprochement et de participation active.

      La professionnalisation du rôle parental : Autrefois, l'école visait à faire de l'enfant un « petit missionnaire des idées modernes » capable de transformer sa famille.

      Aujourd'hui, on attend de la famille qu'elle transforme l'enfant en élève (le « métier d'élève »). La réussite scolaire devient une préoccupation centrale des classes populaires, souvent sous l'angle de l'évitement de l'échec.

      --------------------------------------------------------------------------------

      2. Analyse sémantique et divergences de perception

      Les enquêtes menées auprès de 1000 parents et 2000 enseignants révèlent des décalages significatifs dans la compréhension de la notion de coéducation.

      Compréhension globale

      Parents : 2/3 des parents ne savent pas spontanément à quoi associer le terme.

      Enseignants : La notion est mieux connue, mais associée à un périmètre extrêmement large (520 mots différents cités).

      Définitions prioritaires par groupe d'acteurs

      | Perspective | Priorité 1 | Priorité 2 | | --- | --- | --- | | Parents | Instruction scolaire et apprentissages (30%) | Éducation de l'enfant (25%) | | Enseignants | Éducation globale et comportement de l'élève (55%) | Instruction scolaire (21%) |

      Note : Pour les parents, la coéducation est un outil pour soutenir la scolarité et les apprentissages, tandis que pour les enseignants, elle vise principalement à garantir que l'enfant se comporte conformément aux attentes institutionnelles.

      --------------------------------------------------------------------------------

      3. Le paradoxe de l'implication et les profils d'acteurs

      L'intérêt pour la coéducation décroît à mesure que l'on progresse dans la scolarité :

      Maternelle : 65% des enseignants s'y disent très intéressés.

      Élémentaire : 55%.

      Collège : 41%.

      On observe un « décrochage parental » au collège, période où les difficultés scolaires s'accentuent pourtant pour les élèves les plus fragiles.

      Typologie des parents face à la coéducation

      1. Les parents « en proximité » (34%) : Souvent plus diplômés, membres d'associations, enfants en réussite. Ils sont en « connivence culturelle » avec l'école.

      2. Les parents « distants » ou « empêchés » (47%) : Intéressés par le principe mais peu ou pas impliqués concrètement.

      3. Les parents « invisibles » (20%) : Profil souvent précaire, zone rurale ou quartiers prioritaires, enfants au collège ou en difficulté. Pour eux, la notion est totalement floue.

      --------------------------------------------------------------------------------

      4. Obstacles et freins à la coéducation

      L'analyse souligne que l'absence des parents n'est pas synonyme de désintérêt, mais résulte souvent de barrières structurelles et symboliques.

      La domination symbolique : Les parents précaires redoutent d'être pris en défaut sur leur maîtrise de la langue ou des codes sociaux (« savoir bien parler pour ne pas être jugé »).

      Le rapport au temps : Prendre rendez-vous suppose une maîtrise du temps programmatique.

      Or, les familles vulnérables vivent souvent dans un temps « chaotique » ou de l'urgence.

      La délégitimation par les devoirs : L'externalisation du travail scolaire à la maison aggrave les inégalités.

      Les parents qui veulent aider mais ne maîtrisent pas les méthodes vivent une « disqualification symbolique » devant leurs enfants.

      La norme du « parent d'élève » : L'institution définit implicitement un modèle de parent idéal.

      Ceux qui s'en éloignent sont rapidement étiquetés comme « démissionnaires », alors qu'ils sont en réalité surexposés au jugement institutionnel dès qu'un problème survient.

      « Ce sont des parents que l'école éloigne, plus qu'ils ne sont éloignés de l'école. »

      --------------------------------------------------------------------------------

      5. Les enseignements du confinement (COVID-19)

      La période de crise sanitaire a agi comme un révélateur et un accélérateur de tendances :

      Exacerbation des inégalités : Les conditions de logement et l'incapacité d'aider aux devoirs ont créé des tensions extrêmes dans les familles.

      Découverte de l'humain : L'usage du téléphone a permis de briser la froideur institutionnelle.

      Certains parents ont vécu pour la première fois une « relation humaine » avec les enseignants, basée sur une parole protégée et bienveillante.

      Reconnaissance mutuelle : Le confinement a permis une meilleure valorisation du travail des enseignants par les parents, et une prise de conscience par l'école que le contact avec les familles dites « éloignées » était possible.

      --------------------------------------------------------------------------------

      6. Principes pour une action équitable

      Pour construire une coéducation réelle, Pierre Périer propose quatre principes directeurs :

      1. Principe de reconnaissance

      Égalité : Droits d'information et de statut identiques.

      Mérite : Considérer et gratifier la contribution réelle de chaque parent.

      Confiance : Elle ne se décrète pas, elle découle de la reconnaissance.

      2. Principe d'autorisation

      • Légitimer les « parents réels » (tels qu'ils sont) plutôt que des parents de fiction.

      • Passer de « faire pour » les parents à « faire avec », voire « faire à partir de » leurs attentes.

      • Créer des espaces dédiés (café des parents, lieux de médiation) pour symboliquement leur faire une place.

      3. Principe d'explicitation

      • Clarifier les rôles : qui fait quoi ?

      • Éviter les implicites qui ne profitent qu'aux parents déjà initiés. Plus le code est explicite, plus la relation est égalitaire.

      4. Principe de diversification

      • Multiplier les supports de communication (parole, téléphone, vidéo, objets circulants).

      • S'appuyer sur des médiateurs (parents relais, associations d'éducation populaire) pour maintenir le lien avec ceux qui restent en retrait de l'institution.

      --------------------------------------------------------------------------------

      Conclusion : Outils et perspectives pour la réussite

      L'enquête montre que la réussite des élèves passe, selon les acteurs, par trois leviers majeurs :

      1. L'allègement des effectifs (pour une attention accrue aux élèves en difficulté).

      2. Le développement de la coéducation.

      3. Le renforcement des temps d'étude.

      La coéducation doit être pensée comme un levier collectif et non comme une affaire individuelle, en s'appuyant sur des outils concrets (vidéos de classe, jeux partagés, guides de communication) qui font circuler les savoirs entre l'école et la maison.

    1. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Azur et al seek to determine the role of Imp1/Igf2bp1 in regulating the temporal generation of cortical neuron types. The authors showed that overexpression of Imp1 changes the laminar distribution of cortical neurons and suggest that Imp1 plays a temporal role in specifying cell fates.

      Strengths:

      The study uniquely used TEMPO to investigate the temporal effects of Imp1/Igf2bp1 in cortical development. The disrupted laminar distribution and delayed fate transition are interesting. The results are presented with proper quantification, they are generally well interpreted, and suggest important roles for Imp1.

      Weaknesses:

      (1) While the results suggest Imp1 is important in regulating cortical neurogenesis, it remains unclear when and where it is expressed to execute such temporal functions. For instance, where is Imp1 expressed in the developing brain? Is it specific to the radial glial cells or ubiquitous in progenitors and neurons? Does it show temporal expression in RGCs?

      (2) The advantage and interpretation of TEMPO need further clarification. TEMPO is an interesting method and appears useful in simultaneously labelling cells and controlling gene expression. Since the reporter, Cas9, and gRNA triggers are all driven by ubiquitous promoters and integrated into the genome using piggyBac, it appears logical that the color transition should happen in all cells over time. The color code appears to track the time when the plasmids got integrated instead of the birthday of neurons. Is this logically true? If the TEMPO system is introduced into postmitotic neurons and the CAG promoter is not silenced, would the tri-color transition happen?

      (3) The accumulation of neurons at the subplate region would benefit from showing larger views of the affected hemisphere. IUE is invasive. The glass pipette may consistently introduce focal damages and truncate RGCs. It is important to examine slices covering the whole IUE region.

    1. R0:

      Reviewer #1: Title: Probabilistic Forecasting of Monthly Dengue Cases Using Epidemiological and Climate Signals: A BiLSTM–Naive Bayes Model Versus Mechanistic and Count-Model Baselines. Manuscript Number: PGPH-D-25-03170

      This manuscript presents a rigorous comparative study of probabilistic forecasting models for monthly dengue incidence in Freetown, Sierra Leone, covering the period 2015–2025. It evaluates four major model classes—NB-GLM, INGARCH-NB, Renewal-NB, and BiLSTM-NB—under a leakage-safe rolling-origin evaluation. The article demonstrates strong methodological maturity, careful control of data leakage, and thorough probabilistic evaluation using proper scoring rules, interval coverage, sharpness metrics, PIT diagnostics, and Diebold–Mariano tests. The manuscript is generally well-written, technically sound, and addresses an important operational public health problem. It positions itself as one of the few works offering aligned comparisons of mechanistic, statistical, and deep-learning models under realistic constraints for West African dengue surveillance. This article presents a methodologically rigorous comparison of four probabilistic forecasting approaches—NB-GLM, INGARCH-NB, Renewal-NB, and BiLSTM-NB—applied to monthly dengue case data from Freetown, Sierra Leone (2015–2025). The study addresses an important gap by evaluating mechanistic, statistical, and deep-learning models under aligned, leakage-safe conditions. While the work is comprehensive and technically strong, several critical issues affect its accessibility, interpretability, and broader applicability.

      Strengths The study excels in methodological rigor. Its strict leakage safeguards, careful feature-timing rules, and use of expanding-window rolling-origin evaluation significantly strengthen reliability. The inclusion of proper scoring rules, interval coverage, sharpness metrics, PIT histograms, and Diebold–Mariano tests provides a complete probabilistic evaluation rarely seen in dengue forecasting studies. The horizon-specific findings—INGARCH-NB outperforming at 1–2 months and BiLSTM-NB excelling at 3 months—are well supported by aligned comparisons and statistical significance tests. The transparency of data, code, and alignment artefacts enhances reproducibility and credibility. Additionally, the manuscript offers practical guidance for operational forecasting, including a realistic “light climate” input strategy suitable for resource-limited settings.

      Limitations Despite its strengths, the manuscript is heavily technical, with extensive mathematical exposition in the main text. This may limit accessibility for public-health practitioners who are likely part of the target audience. The mechanistic renewal model is presented as a baseline but is arguably underspecified; the use of a short, fixed 3-month kernel may not realistically capture dengue’s generation interval dynamics, likely contributing to its poor performance. This limits the interpretive value of the mechanistic comparison. This limitation should be addressed. The study’s climate treatment, while intentionally conservative, may underexploit important environmental drivers. Although justified operationally, this constraint restricts exploration of potentially meaningful lag structures or seasonal climate anomalies. The analysis is limited to a single city and monthly data frequency, raising questions about generalizability across geographies with different climate patterns and dengue transmission dynamics. Moreover, the monthly temporal resolution may obscure rapid outbreak shifts, possibly disadvantaging mechanistic and hybrid models that rely on finer-grained dynamics. This should be addressed. The manuscript makes a valuable and original contribution to dengue forecasting, offering robust methodological innovations and practical insights for real-time surveillance systems. However, improved clarity, stronger justification for mechanistic assumptions, and expanded discussion of generalizability would enhance its usefulness and scholarly impact. With revisions to improve accessibility and contextual depth, the study is well positioned for publication and for informing operational forecasting practice in similar settings.

      Reviewer #2: 1. What is PIT in the abstract stand for? The authors should avoid using abbreviations in the abstract. 2. The authors should providing some additional analysis, such as experimenting with alternative or longer serial-interval kernels, or simple sensitivity checks (e.g., different window lengths, or, if possible, finer temporal resolution). 3. Please, justifies the small climate feature set, mentioning any exploratory work with larger sets. 4. The authors should add a clearly labelled missing-data handling subsection that specifies: The imputation method, the number of imputed months, and how they were used in training/evaluation, plus any sensitivity. 5. While the architecture, optimization, and calibration steps are described, the process for choosing hyperparameters is not fully audit-ready. 6. I recommend that the authors conduct an additional experiment to demonstrate the generalizability of the proposed model.

    1. EIP-140

      allows for REVERT opcode, this code allows one to retert a failed opration, without comsuming the whole gaas, INVALID as the opcode before it, that actually consumes all gas. revert made possible for debug, and gas retrun on remain.

    1. Casos de Uso Código UCCapacidadCaso de UsoActorDescripciónFaseCAT-UC-01-01CAT-CAP-01Definir Capability TécnicaAdmin MVNARegistrar una capability técnica global en el diccionario maestro del catálogo para ser utilizada en la construcción de Service Profiles.MVP Flujos Administrativos Normalizados FAN-CAT-01 — Registrar Capability Técnica Usado por: CAT-UC-01-01 Precondiciones El code de la capability no debe existir previamente en la CapabilityLibrary (Unicidad). Pasos canónicos Se ingresa ingresa el código normalizado, descripción y categoría. El sistema debe canonizar el code (trim, uppercase) para evitar duplicados por formato. El sistema valida que la categoría pertenezca al Enum definido: NETWORK, FEATURE o RESTRICTION. Se registra la entidad Capability en estado active: true por defecto. El sistema emite el evento de dominio CapabilityDefined. Resultado Capability disponible en la CapabilityLibrary para ser referenciada en cualquier ProfileSpec. FAN-CAT-02 — Modificar Capability Técnica Precondiciones La Capability existe en la CapabilityLibrary. Pasos Canónicos El sistema bloquea la edición del campo code. El código es inmutable una vez creado para no romper las referencias en los ProfileSpec. Se modifica la description o la category. Se guardan los cambios. Esta acción no afecta a los Service Profiles activos, ya que estos referencian al code, que no ha cambiado. El sistema emite el evento de dominio CapabilityUpdated. FAN-CAT-03 — Desactivar Capability Técnica Precondiciones La Capability existe y está en estado active: true. Pasos Canónicos El sistema marca la Capability como active: false. A partir de este momento, la capacidad deja de ser visible/seleccionable en el flujo de CAT-UC-06 "Configurar Definición Técnica de una Versión". El sistema no elimina la capacidad de la base de datos ni de las versiones de Service Profile (activas o inactivas) que ya la contienen. Esto garantiza que el histórico y la provisión técnica sigan funcionando para las SIMs activas. El sistema emite el evento de dominio CapabilityDeactivated.

      SACAR TODA LA CAPACIDAD, Ya que esto las capabilitys deberan estar precargadas en una entidad, con los atributos que necesite serviceprofile-

    1. Reviewer #2 (Public review):

      The article is very well written, and the new methodology is presented with care. I particularly appreciated the step-by-step rationale for establishing the approach, such as the relationship between K-means centers and the various parameters. This text is conveniently supported by the flow charts and t-SNE plots. Importantly, I thought the choice of state-of-the-art method was appropriate and the choice of dataset adequate, which together convinced me in believing the large improvement reported. I thought that the crossmodal feature-engineering solution proposed was elegant and seems exportable to other fields. Here are a few notes.<br /> While the validation data set was well chosen and of high quality, it remains a single dataset and also remains a non-recurrent network. The authors acknowledge this in the discussion, but I wanted to chime in to say that for the method to be more than convincing, it would need to have been tested on more datasets. It should be acknowledged that the problem becomes more complicated in a recurrent excitatory network, and thus the method may not work as well in the cortex or in CA3.

      While the data is shown to work in this particular dataset (plus the two others at the end), I was left wondering when the method breaks. And it should break if the models are sufficiently mismatched. Such a question can be addressed using synthetic-synthetic models. This was an important intuition that I was missing, and an important check on the general nature of the method that I was missing.

      While the choice of state-of-the-art is good in my opinion, I was looking for comments on the methods prior to that. For instance, methods such based on GLMs have been used by the Pillow, Paninski, and Truccolo groups. I could not find a decent discussion of these methods in the main text and thought that both their acknowledgement and rationale for dismissing were missing.

      While most of the text was very clear, I thought that page 11 was odd and missing much in terms of introductions. Foremost is the introduction of the dataset, which is never really done. Page 11 refers to 'this dataset', while the previous sentence was saying that having such a dataset would be important and is challenging. The dataset needs to be properly described: what's the method for labeling, what's the brain area, what were the spike recording methodologies, what is meant by two labeling methodologies, what do we know about the idiosyncrasies of the particular network the recording came from (like CA1 is non-recurrent, so which connections)? I was surprised to see 'English et al.' cited in text only on page 13 since their data has been hailed from the beginning.

      Further elements that needed definition are the Nsyn and i, which were not defined in the cortex of Equation 2-3: I was not sure if it referred to different samples or different variants of the synthetic model. I also would have preferred having the function f defined earlier, as it is defined for Equation 3, but appears in Equation 2.

      When the loss functions are described, it would be important to define 'data' and 'labels' here. This machine learning jargon has a concrete interpretation in this context, and making this concrete would be very important for the readership.

      While I appreciated that there was a section on robustness, I did not find that the features studied were the most important. In this context, I was surprised that the other datasets were relegated to supplementary, as these appeared more relevant.

      Some of the figures have text that is too small. In particular, Figure 2 has text that is way too small. It seemed to me that the pseudo code could stand alone, and the screenshot of the equations did not need to be repeated in a figure, especially if their size becomes so small that we can't even read them.

    2. Author response:

      General Response

      We thank the reviewers for their positive assessment of our work and for acknowledging the timeliness of the problem and the novelty of using domain adaptation to address model mismatch. We appreciate the constructive feedback regarding validation and clarity. In the revised manuscript, we will address these points as follows:

      (1) Systematic Validation: We will design and perform systematic in silico experiments to evaluate the method beyond the single in vivo dataset , including robustness tests regarding recording length and network synchrony.

      (2) Recurrent Networks & Failure Analysis: We will test our method on synthetic datasets generated from highly recurrent networks and analyze exactly when the method breaks as a function of mismatch magnitude.

      (3) Method Comparisons: We will report the Matthews Correlation Coefficient (MCC) for the approach by English et al. (2017) and expand our comparison and discussion of GLM-based methods.

      (4) Clarifications: We will rigorously define the dataset details (labeling, recording methodology), mathematical notation, and machine learning terminology ('data', 'labels').

      (5) Discussion of Limitations: We will explicitly discuss the challenges and limitations inherent in generalizing to more recurrently connected regions.

      Below are our more detailed responses:

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      (1) The validation of the approach is incomplete: due to its very limited size, the single ground-truth dataset considered does not provide a sufficient basis to draw a strong conclusion. While the authors correctly note that this is the only dataset of its kind, the value of this validation is limited compared to what could be done by carefully designing in silico experiments.

      We thank the reviewer for acknowledging the scarcity of suitable in vivo ground-truth datasets and the limitations this poses. We agree that additional validation is necessary to draw strong conclusions. In the revised manuscript, we will systematically design and perform in silico experiments for evaluations beyond the single in vivo dataset.

      (2) Surprisingly, the authors fail to compare their method to the approach originally proposed for the data they validate on (English et al., 2017).

      We agree that this is an essential comparison. We will report the Matthews Correlation Coefficient (MCC) result of the approach by English et al. (2017) on the spontaneous period of the recording.

      (3) The authors make a commendable effort to study the method's robustness by pushing the limits of the dataset. However, the logic of the robustness analysis is often unclear, and once again, the limited size of the dataset poses major limitations to the authors.

      We appreciate the reviewer recognizing our initial efforts to evaluate robustness. In our original draft, we tested recording length, network model choices, and analyzed failure cases. However, we agree that the limited real data restricts the scope of these tests. To address this, we will perform more systematic robustness tests on the newly generated synthetic datasets in the revised version, allowing us to evaluate performance under a wider range of conditions.

      (4) The lack of details concerning both the approach and the validation makes it challenging for the reader to establish the technical soundness of the study.

      We will revise the manuscript thoroughly to better present the methodology of our framework and the validation pipelines. We will ensure that the figures and text clearly articulate the technical details required to assess the soundness of the study.

      Although in the current form this study does not provide enough basis to judge the impact of DeepDAM in the broader neuroscience community, it nevertheless puts forward a valuable and novel idea: using domain adaptation to mitigate the problem of model mismatch. This approach might be leveraged in future studies and methods to infer connectivity.

      We thank the reviewer again for acknowledging the novelty and importance of our work.

      Reviewer #2 (Public review):

      While the validation data set was well chosen and of high quality, it remains a single dataset and also remains a non-recurrent network. The authors acknowledge this in the discussion, but I wanted to chime in to say that for the method to be more than convincing, it would need to have been tested on more datasets. It should be acknowledged that the problem becomes more complicated in a recurrent excitatory network, and thus the method may not work as well in the cortex or in CA3.

      We will carefully revise our text to specifically discuss this limitation and the challenges inherent in generalizing to more recurrently connected regions. Furthermore, to empirically address this concern, we will test our method extensively on synthetic datasets generated from highly recurrent networks to quantify performance in these regimes.

      While the data is shown to work in this particular dataset (plus the two others at the end), I was left wondering when the method breaks. And it should break if the models are sufficiently mismatched. Such a question can be addressed using synthetic-synthetic models. This was an important intuition that I was missing, and an important check on the general nature of the method that I was missing.

      We thank the reviewer for this insight regarding the general nature of the method. While we previously analyzed failure cases regarding strong covariation and low spike counts, we agree that a systematic analysis of mismatch magnitude is missing. Building on our planned experiments with synthetic data, we will analyze and discuss exactly when the method breaks as a function of the mismatch magnitude between datasets.

      While the choice of state-of-the-art is good in my opinion, I was looking for comments on the methods prior to that. For instance, methods such based on GLMs have been used by the Pillow, Paninski, and Truccolo groups. I could not find a decent discussion of these methods in the main text and thought that both their acknowledgement and rationale for dismissing were missing.

      As the reviewer noted, we extensively compared our method with a GLM-based method (GLMCC) and CoNNECT, whose superiority over other GLM-based methods, such as extend GLM method (Ren et al., 2020, J Neurophysiol), have already been demonstrated in their papers (Endo et al., Sci Rep, 2021). However, we acknowledge that the discussion of the broader GLM literature was insufficient. To make the comparison more thorough, we will conduct comparisons with additional GLM-based methods and include a detailed discussion of these approaches.

      Endo, D., Kobayashi, R., Bartolo, R., Averbeck, B. B., Sugase-Miyamoto, Y., Hayashi, K., ... & Shinomoto, S. (2021). A convolutional neural network for estimating synaptic connectivity from spike trains. Scientific Reports, 11(1), 12087.

      Ren, N., Ito, S., Hafizi, H., Beggs, J. M., & Stevenson, I. H. (2020). Model-based detection of putative synaptic connections from spike recordings with latency and type constraints. Journal of Neurophysiology, 124(6), 1588-1604.

      While most of the text was very clear, I thought that page 11 was odd and missing much in terms of introductions. Foremost is the introduction of the dataset, which is never really done. Page 11 refers to 'this dataset', while the previous sentence was saying that having such a dataset would be important and is challenging. The dataset needs to be properly described: what's the method for labeling, what's the brain area, what were the spike recording methodologies, what is meant by two labeling methodologies, what do we know about the idiosyncrasies of the particular network the recording came from (like CA1 is non-recurrent, so which connections)? I was surprised to see 'English et al.' cited in text only on page 13 since their data has been hailed from the beginning.

      Further elements that needed definition are the Nsyn and i, which were not defined in the cortex of Equation 2-3: I was not sure if it referred to different samples or different variants of the synthetic model. I also would have preferred having the function f defined earlier, as it is defined for Equation 3, but appears in Equation 2.

      When the loss functions are described, it would be important to define 'data' and 'labels' here. This machine learning jargon has a concrete interpretation in this context, and making this concrete would be very important for the readership.

      We thank the reviewer for these constructive comments on the writing. We will clarify the introduction of the dataset (labeling method, brain area, recording methodology) and ensure all mathematical terms (such as Nsyn, i, and function f) and machine learning terminology (definitions of 'data' and 'labels' in this context) are rigorously defined upon first use in the revised manuscript.

      While I appreciated that there was a section on robustness, I did not find that the features studied were the most important. In this context, I was surprised that the other datasets were relegated to supplementary, as these appeared more relevant.

      Robustness is an important aspect of our framework to demonstrate its applicability to real experimental scenarios. We specifically analyzed how synchrony between neurons, the number of recorded spikes and the choice of the network influence the performance of our method. We also agree that these aspects are limited by the one dataset we evaluated on. Therefore, we will test the robustness of our method more systematically on synthetic datasets.

      With more extensive analysis on synthetic datasets, we believe that the results on inferring biophysical properties of single neuron and microcircuit models remain in the supplement, such that the main figures focus purely on synaptic connectivity inference.

      Some of the figures have text that is too small. In particular, Figure 2 has text that is way too small. It seemed to me that the pseudo code could stand alone, and the screenshot of the equations did not need to be repeated in a figure, especially if their size becomes so small that we can't even read them.

      We will remove the pseudo-code and equations from Figure 2 to improve readability. The pseudo-code will be presented as a distinct box in the main text.

    1. Reviewer #1 (Public review):

      Summary:

      The authors develop a Python-based analysis framework for cellular organelle segmentation, feature extraction, and analysis for live-cell imaging videos. They demonstrate that their pipeline works for two organelles (mitochondria and lysosomes) and provide a step-by-step overview of the AutoMorphoTrack package.

      Strengths:

      The authors provide evidence that the package is functional and can provide publication-quality data analysis for mitochondrial and lysosomal segmentation and analysis.

      Weaknesses:

      (1) I was enthusiastic about the manuscript as a good end-to-end cell/organelle segmentation and quantification pipeline that is open-source, and is indeed useful to the field. However, I'm not certain AutoMorphoTrack fully fulfills this need. It appears to stitch together basic FIJI commands in a Python script that an experienced user can put together within a day. The paper reads as a documentation page, and the figures seem to be individual analysis outputs of a handful of images. Indeed, a recent question on the image.sc forum prompted similar types of analysis and outputs as a simple service to the community, and with seemingly better results and integrated organelle identity tracking (which is necessary in my opinion for live imaging). I believe this is a better fit in the methods section of a broader work. https://forum.image.sc/t/how-to-analysis-organelle-contact-in-fiji-with-time-series-data/116359/5.

      (2) The authors do not discuss or compare to any other pipelines that can accomplish similar analyses, such as Imaris, CellProfiler, or integrate options for segmentation, etc., such as CellPose, StarDist.

      (3) Although LLM-based chatbot integration seems to have been added for novelty, the authors do not demonstrate in the manuscript, nor provide instructions for making this easy-to-implement, given that it is directed towards users who do not code, presumably.

    2. Reviewer #2 (Public review):

      Summary:

      AutoMorphoTrack provides an end-to-end workflow for organelle-scale analysis of multichannel live-cell fluorescence microscopy image stacks. The pipeline includes organelle detection/segmentation, extraction of morphological descriptors (e.g., area, eccentricity, "circularity," solidity, aspect ratio), tracking and motility summaries (implemented via nearest-neighbor matching using cKDTree), and pixel-level overlap/colocalization metrics between two channels. The manuscript emphasizes a specific application to live imaging in neurons, demonstrated on iPSC-derived dopaminergic neuronal cultures with mitochondria in channel 0 and lysosomes in channel 1, while asserting adaptability to other organelle pairs.

      The tool is positioned for cell biologists, including users with limited programming experience, primarily through two implemented modes of use: (i) a step-by-step Jupyter notebook and (ii) a modular Python package for scripted or batch execution, alongside an additional "AI-assisted" mode that is described as enabling analyses through natural-language prompts.

      The motivation and general workflow packaging are clear, and the notebook-plus-modules structure is a reasonable engineering choice. However, in its current form, the manuscript reads more like a convenient assembly of standard methods than a validated analytical tool. Key claims about robustness, accuracy, and scope are not supported by quantitative evidence, and the 'AI-assisted' framing is insufficiently defined and attributes to the tool capabilities that are provided by external LLM platforms rather than by AutoMorphoTrack itself. In addition, several figure, metric, and statistical issues-including physically invalid plots and inconsistent metric definitions-directly undermine trust in the quantitative outputs.

      Strengths:

      (1) Clear motivation: lowering the barrier for organelle-scale quantification for users who do not routinely write custom analysis code.

      (2) Multiple entry points: an interactive notebook together with importable modules, emphasizing editable parameters rather than a fully opaque black box.

      (3) End-to-end outputs: automated generation of standardized visualizations and tables that, if trustworthy, could help users obtain quantitative summaries without assembling multiple tools.

      Weaknesses:

      (1) "AI-assisted / natural-language" functionality is overstated.

      The manuscript implies an integrated natural-language interface, but no such interface is implemented in the software. Instead, users are encouraged to use external chatbots to help generate or modify Python code or execute notebook steps. This distinction is not made clearly and risks misleading readers.

      (2) No quantitative validation against trusted ground truth.

      There is no systematic evaluation of segmentation accuracy, tracking fidelity, or interaction/overlap metrics against expert annotations or controlled synthetic data. Without such validation, accuracy, parameter sensitivity, and failure modes cannot be assessed.

      (3) Limited benchmarking and positioning relative to existing tools.

      The manuscript does not adequately compare AutoMorphoTrack to established platforms that already support segmentation, morphometrics, tracking, and colocalization (e.g., CellProfiler) or to mitochondria-focused toolboxes (e.g., MiNA, MitoGraph, Mitochondria Analyzer). This is particularly problematic given the manuscript's implicit novelty claims.

      (4) Core algorithmic components are basic and likely sensitive to imaging conditions.

      Heavy reliance on thresholding and morphological operations raises concerns about robustness across varying SNR, background heterogeneity, bleaching, and organelle density; these issues are not explored.

      (5) Multiple figure, metric, and statistical issues undermine confidence.

      The most concerning include:<br /> (i) "Circularity (4πA/P²)" values far greater than 1 (Figures 2 and 7, and supplementary figures), which is inconsistent with the stated definition and strongly suggests a metric/label mismatch or computational error.

      (ii) A displacement distribution extending to negative values (Figure 3B). This is likely a plotting artifact (e.g., KDE boundary bias), but as shown, it is physically invalid and undermines confidence in the motility analysis.

      (iii) Colocalization/overlap metrics that are inconsistently defined and named, with axis ranges and terminology that can mislead (e.g., Pearson r reported for binary masks without clarification).

      (iv) Figure legends that do not match the displayed panels, and insufficient reporting of Ns, p-values, sampling units, and statistical assumptions.

    3. Reviewer #3 (Public review):

      Summary:

      AutoMorphoTrack is a Python package for quantitatively evaluating organelle shape, movement, and colocalization in high-resolution live cell imaging experiments. It is designed to be a beginning-to-end workflow from segmentation through metric graphing, which is easy to implement. The paper shows example results from their images of mitochondria and lysosomes within cultured neurons, demonstrating how it can be used to understand organelle processing.

      Strengths:

      The text is well-written and easy to follow. I particularly appreciate tables 1 and 2, which clearly define the goals of each module, the tunable parameters, and the input and outputs. I can see how the provided metrics would be useful to other groups studying organelle dynamics. Additionally, because the code is open-source, it should be possible for experienced coders to use this as a backbone and then customize it for their own purposes.

      Weaknesses:

      Unfortunately, I was not able to install the package to test it myself using any standard install method. This is likely fixable by the authors, but until a functional distribution exists, the utility of this tool is highly limited. I would be happy to re-review this work after this is fixed.

      The authors claim that there is "AI-Assisted Execution and Natural-Language Interface". However, this is never defended in any of the figures, and from quickly reviewing the .py files, there does not seem to be any built-in support or interface for this. Without significantly more instructions on how to connect this package to a (free) LLM, along with data to prove that this works reproducibly to produce equivalent results, this section should be removed.

      Additionally, I have a few suggestions/questions:

      (1) Red-green images are difficult for colorblind readers. I recommend that the authors change all raw microscopy images to a different color combination.

      (2) For all of the velocity vs displacement graphs (Figure 3C and subpart G of every supplemental figure), there is a diagonal line clearly defining a minimum limit of detected movement. Is this a feature of the dataset (drift /shakiness /etc) or some sort of minimum movement threshold in the tracking algorithm? This should be discussed in the text.

      (3) Integrated Correlation Summary (Figure 5) - Pearson is likely the wrong metric for most of these metric pairs because even interesting relationships may be non-linear. Please replace with Spearman correlation, which is less dependent on linearity.

    4. Author response:

      Reviewer #1

      We thank the reviewer for their thoughtful and constructive assessment of AutoMorphoTrack and for recognizing its potential utility as an open-source end-to-end workflow for organelle analysis.

      (1) Novelty and relationship to existing tools / FIJI workflows

      We appreciate this concern and agree that many of the underlying image-processing operations (e.g., thresholding, morphological cleanup, region properties) are well-established. Our goal with AutoMorphoTrack is not to introduce new segmentation algorithms, but rather to provide a curated, reproducible, and extensible end-to-end workflow that integrates segmentation, morphology, tracking, motility, and colocalization into a single, transparent pipeline tailored for live-cell organelle imaging.

      While an experienced user could assemble similar analyses ad hoc using FIJI or custom scripts, our contribution lies in:

      Unifying these steps into a single workflow with consistent parameterization and outputs

      Generating standardized, publication-ready visualizations and tables at each step,

      Enabling batch and longitudinal analyses across cells and conditions, and

      Lowering the barrier for users who do not routinely write custom analysis code.

      We note that the documentation-style presentation of the manuscript is intentional, as it serves both as a methods paper and a practical reference for users implementing the workflow. We agree, however, that the manuscript currently overemphasizes step-by-step execution at the expense of positioning. In revision, we will more explicitly frame AutoMorphoTrack as a workflow integration and usability contribution, rather than a fundamentally new algorithmic advance.

      We will also cite and discuss the image.sc example referenced by the reviewer, clarifying conceptual overlap and differences in scope.

      (2) Comparison to existing pipelines (Imaris, CellProfiler, CellPose, StarDist)

      We agree and thank the reviewer for highlighting this omission. In the revised manuscript, we will expand the related-work and positioning section to explicitly compare AutoMorphoTrack with established commercial (e.g., Imaris) and open-source (e.g., CellProfiler, MiNA, MitoGraph) platforms, as well as learning-based segmentation tools such as CellPose and StarDist.

      Rather than claiming superiority, we will clarify trade-offs, emphasizing that AutoMorphoTrack prioritizes:

      Transparency and parameter interpretability,

      Lightweight dependencies suitable for small live-imaging datasets

      Direct integration of morphology, tracking, and colocalization in a single workflow, and

      Ease of modification for domain-specific use cases.

      (3) AI / chatbot integration

      We appreciate this critique and agree that the current description is insufficiently precise. AutoMorphoTrack does not implement a native natural-language interface. Instead, our intent was to convey that the workflow can be executed and modified with assistance from external large language models (LLMs) in a notebook-based environment.

      In revision, we will revise this section to:

      Clearly distinguish AutoMorphoTrack’s functionality from that of external LLM tools,

      Remove any implication of a built-in AI interface, and

      Provide concrete, reproducible examples of how non-coding users may interact with the pipeline using natural-language prompts mediated by external tools.

      Reviewer #2

      We thank the reviewer for their detailed and technically rigorous evaluation. We appreciate the recognition of the workflow’s motivation and structure, and we agree that several aspects of validation, positioning, and quantitative reporting must be strengthened.

      (1) AI-assisted / natural-language functionality

      We agree with this critique. AutoMorphoTrack does not provide a native natural-language execution layer, and the manuscript currently overstates this aspect. In revision, we will explicitly scope any reference to AI assistance as external, optional support for code generation and parameter editing, with clearly documented examples and stated limitations.

      We agree that conflating external LLM capabilities with the software itself risks misleading readers, and we will correct this accordingly.

      (2) Lack of quantitative validation

      We fully agree that the current manuscript lacks formal quantitative validation. In the revised version, we will add a dedicated validation section including:

      Segmentation accuracy compared to expert annotations using overlap metrics (e.g., Dice / IoU),

      Tracking fidelity assessed using manually annotated tracks and/or synthetic ground truth,

      Sensitivity analyses for key parameters (e.g., thresholding and linking distance), and

      Explicit discussion of failure modes and quality-control indicators.

      We acknowledge that without such validation, claims of robustness are not sufficiently supported.

      (3) Benchmarking and positioning relative to existing tools

      We agree and will substantially strengthen AutoMorphoTrack’s benchmarking and positioning relative to existing platforms. Rather than framing novelty algorithmically, we will clarify that the primary contribution is a reproducible, integrated workflow designed specifically for two-organelle live imaging in neurons, with transparent parameters and standardized outputs.

      We note that our goal is not to exhaustively benchmark against all available tools, but rather to provide representative comparisons that clarify operating regimes, assumptions, and trade-offs. We will add a comparative table and/or qualitative comparison highlighting strengths, assumptions, and limitations relative to existing tools.

      (4) Core algorithms and robustness

      We agree that reliance on threshold-based segmentation introduces sensitivity to imaging conditions. In revision, we will:

      Explicitly discuss the operating regime and assumptions under which AutoMorphoTrack performs reliably,

      Clarify that the framework is modular and can accept alternative segmentation backends, and

      Include guidance on when outputs should be treated with caution.

      (5) Figure, metric, and statistical issues

      We thank the reviewer for identifying several critical issues and agree that these undermine confidence. In revision, we will correct all figure, metric-definition, and reporting inconsistencies, including:

      Resolving circularity values exceeding 1 by correcting computation and/or labeling errors,

      Revising physically invalid displacement plots and clarifying kernel-density limitations,

      Ensuring colocalization metrics are consistently defined, named, and interpreted, with explicit clarification of whether calculations are intensity- or mask-based,

      Correcting figure legends to match displayed panels, and

      Clearly reporting sample size, sampling units, and statistical assumptions, including handling of multiple comparisons where applicable.

      (6) Value-added demonstration

      We agree that the manuscript would benefit from a clearer demonstration of value-added use cases. In revision, we will include at least one realistic example showing how AutoMorphoTrack enables a complete, reproducible analysis workflow with reduced setup burden compared to manually assembling multiple tools.

      (7) Editorial suggestions

      We agree and will streamline the Results section to reduce procedural repetition and focus more on validation, limitations, and quality-control guidance.

      Reviewer #3

      We thank the reviewer for their positive assessment of clarity and organization, and for the constructive practical feedback.

      Installation issues

      We appreciate the detailed report of installation failures and acknowledge that the current packaging and distribution are inadequate. Prior to revision, we will:

      Fix the package structure to support standard installation methods,

      Ensure all required files (e.g., setup configuration, README) are correctly included,

      Test installation on clean environments across platforms, and

      Correct broken links to notebooks and documentation.

      We agree that without a functional installation pathway, the utility of the tool is severely limited.

      AI-assisted claims

      We agree with the reviewer and echo our responses above. The AI-assisted description will be clarified and appropriately scoped in the revised manuscript.

      Additional suggestions

      Color accessibility: We will revise all figures to use colorblind-safe palettes.

      Velocity–displacement diagonal: We will explicitly explain the origin of this relationship, including whether it reflects dataset properties, tracking assumptions, or minimum detectable motion.

      Integrated correlation metric: We agree that Spearman correlation is more appropriate for many of these relationships and will replace Pearson correlations accordingly.

      Supplementary movies: We agree that providing raw movies would improve interpretability and will add representative examples as supplementary material.

    1. Benefits of Functions# There are several advantages to creating and using functions in computer programs, such as: Reusing code instead of repeating code: When we find ourselves repeating a set of actions in our program, we end up writing (or copying) the same code multiple times. If we put that repeated code in a function, then we only have to write it once and then use that function in all the places we were repeating the code. Single, standardized definitions: Let’s say we made code that takes a name and tries to split it into a first name and last name, and we have that code copied in several places in our program. Then we realize that our code isn’t handling some last names correctly, like “O’Reilly” and “Del Toro.” If we fix this bug in one of the places the code is copied in our program it still will be broken elsewhere, so we have to find all the places and fix it there. If, on the other hand we had the code to split names in a function, and used that function everywhere else, then we only have to fix the bug inside that one function and our code everywhere is fixed. Code organization: Making functions also can help us organize our code. It lets us give a name to a block of code, and when we use it, those function names can help make the code more understandable. Making code as functions also helps in letting us put those pieces of code in other files or in code libraries, so the file we are working on is smaller and easier to manage.

      This explanation clearly shows how functions improve efficiency and clarity in programming by reducing repetition, standardizing logic, and making code easier to read and manage.

    1. Web crawlers are pieces of code that find and down

      The description of web crawlers highlights how information is shaped before a query is even inputted. I did not know that Crawlers only index sites that allow access, so there is much of the internet that may not be available due to policy restrictions. It also makes me wonder who regulates the ethical decisions that are made as far as which information is presented and what is invisible. With that in mind, I think it puts smaller organizations at a disadvantage due to how it's set up.

    1. 4.1.2. Basic Data Types# First, we’ll look at a few basic data storage types. We’ll also be including some code examples you can look at, though don’t worry yet if you don’t understand the code, since we’ll be covering these in more detail throughout the rest of the book. Booleans (True / False)# Binary consisting of 0s and 1s make it easy to represent true and false values, where 1 often represents true and 0 represents false. Most programming languages have built-in ways of representing True and False values. Fig. 4.4 A blue checkmark is something an account either has or doesn’t so it can be stored as a binary value.# Booleans are often created when doing sort of comparison or test, like: Do I have enough money in my wallet to pay for the item? Does this tweet start with “hello” (meaning it is a greeting)? Click to see example Python code # Save a boolean value in a variable called does_user_have_blue_checkmark does_user_have_blue_checkmark = True # Save a boolean value in a variable based on a comparison. # The code checks if a wallet has more in it than the cost of the item # which will be True or False, and be saved in has_enough_money has_enough_money = money_in_wallet > cost_of_item # Save a boolean value in a variable based on a function call. # The code checks if the text of a tweet (stored in tweet_text) starts # with "Hello", which will be True or False, and be saved in is_greeting is_greeting = tweet_text.starts_with("Hello") Copy to clipboard Numbers# Numbers are normally stored in two different ways: Integer: whole numbers like 5, 37, -10, and 0 Floating point numbers: these can represent decimals like: 0.75, -1.333, and 3 x 10 ^ 8 Fig. 4.5 The number of replies, retweets, and likes can be represented as integer numbers (197.8K can be stored as a whole number like 197,800).

      This section helped me clearly see how different data types represent different kinds of information. Booleans are especially interesting because they force complex situations into true/false decisions, which can oversimplify reality. It also made me realize how choices about numbers and strings affect what computers can accurately store and how much meaning might be lost through rounding or categorization.

    1. Dictionaries# The other method of grouping data that we will discuss here is called a “dictionary” (sometimes also called a “map”). You can think of this as like a language dictionary where there is a word and a definition for each word. Then you can look up any name or word and find the value or definition. Example: An English Language Dictionary with definitions of three terms: Social Media: An internet-based platform used for people to form connections to each other and share things. Ethics: Thinking systematically about what makes something morally right or wrong, or using ethical systems to analyze moral concerns in different situations Automation: Making a process or activity that can run on its own without needing a human to guide it. The Dictionary data type allows programmers to combine several pieces of data by naming each piece. When we do this, the dictionary will have a number of names, and for each of those names a piece of information (called a “value” in this context). Dictionary: Name 1: Value 1 Name 2: Value 2 Name 3: Value 3 So if we look at the example tweet, we can combine all the data in a dictionary. Fig. 4.9 A tweet with photos of a cute puppy! (source)# Dictionary (with some of the data): user_name: “WeRateDogs®” user_handle: “@dog_rates” user_has_blue_checkmark: True tweet_text: “This is Woods. He’s here to help with the dishes. Specifically the pre-rinse, where he licks every item he can. 12/10” number_of_replies: 1533 number_of_retweets: 26200 number_of_likes: 197800 Click to see example Python code # Save some info about a tweet in a variable called tweet_info tweet_info = { "user_name": "WeRateDogs®", "user_handle": "@dog_rates", "user_has_blue_checkmark": True, "tweet_text": "This is Woods. He’s here to help with the dishes. Specifically the pre-rinse, where he licks every item he can. 12/10", "number_of_replies": 1533, "number_of_retweets": 26200, "number_of_likes": 197800 } Copy to clipboard Note: We’ll demonstrate dictionaries later in Chapter 5: History of Social Media, and Chapter 8: Data Mining. Groups within Groups# We can use dictionaries and lists together to make lists of dictionaries, lists of lists, dictionaries of lists, or any other combination. So for example, I could make a list of Twitter users. Each Twitter user could be a dictionary with info about that user, and one piece of information it might have is a list of who that user is following. List of users: User 1: Username: kylethayer (a String) Twitter handle: @kylemthayer (a String) Profile Picture: [TODO picture here] (an image) Follows: @SusanNotess, @UW, @UW_iSchool, @ajlunited, … (a list of Strings) User 2: Username: Dr Susan Notess (a String) Twitter handle: @SusanNotess (a String) Profile Picture: [TODO picture here] (an image) Follows: @kylemthayer, @histoftech, @j_kalla, @dbroockman, @qaxaawut, @shengokai, @laniwhatison (a list of Strings)

      I like the dictionary analogy because it makes clear how data gets structured and labeled. By assigning names to values, dictionaries don’t just store information, they also shape how programmers interpret and access it. This made me realize that how data is organized can influence what questions are easy—or hard—to ask later.

    1. 3.1. Definition of a bot# There are several ways computer programs are involved with social media. One of them is a “bot,” a computer program that acts through a social media account. There are other ways of programming with social media that we won’t consider a bot (and we will cover these at various points as well): The social media platform itself is run with computer programs, such as recommendation algorithms (chapter 12). Various groups want to gather data from social media, such as advertisers and scientists. This data is gathered and analyzed with computer programs, which we will not consider bots, but will cover later, such as in Chapter 8: Data Mining. Bots, on the other hand, will do actions through social media accounts and can appear to be like any other user. The bot might be the only thing posting to the account, or human users might sometimes use a bot to post for them. Note that sometimes people use “bots” to mean inauthentically run accounts, such as those run by actual humans, but are paid to post things like advertisements or political content. We will not consider those to be bots, since they aren’t run by a computer. Though we might consider these to be run by “human computers” who are following the instructions given to them, such as in a click farm: Fig. 3.1 A photo that is likely from a click-farm, where a human computer is paid to do actions through multiple accounts, such as like a post or rate an app. For our purposes here, we consider this a type of automation, but we are not considering this a “bot,” since it is not using (electrical) computer programming.# { requestKernel: true, binderOptions: { repo: "binder-examples/jupyter-stacks-datascience", ref: "master", }, codeMirrorConfig: { theme: "abcdef", mode: "python" }, kernelOptions: { kernelName: "python3", path: "./ch03_bots" }, predefinedOutput: true } kernelName = 'python3' previous 3. Bots next

      This section helped clarify that not all automation on social media counts as a bot. I found it especially useful that the definition focuses on whether the account is operated by computer code rather than by humans, even if those humans behave mechanically, like in click farms. This distinction makes it easier to think more precisely about responsibility and accountability when automation affects online spaces.

    1. In Alabama, a property tax was proposed; inTexas, the sale of public lands was offered; in Maryland, changes to thestate tax code to allow local taxation were put forth; in South Carolina,Murray suggested that unclaimed Civil War bounties could be used; andNorth Carolina debated a specific consumer tax for education. GovernorHarrison Reed’s plan in Florida was to increase land assessments to fundpublic goods, and this model was followed in other Southern states byblack political leaders

      Needed to raise money for a school system

    Annotators

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1(Public review):

      Summary:

      In this study, the authors distinguished afferent inputs to different cell populations in the VTA using dimensionality reduction approaches and found significantly distinct patterns between normal and drug treatment conditions. They also demonstrated negative correlations of the inputs induced by drugs with gene expression of ion channels or proteins involved in synaptic transmission and demonstrated the knockdown of one of the voltage-gated calcium ion channels caused decreased inputs.

      Weaknesses:

      (1) For quantifications of brain regions in this study, boundaries were based on the Franklin-Paxinos (FP) atlas according to previous studies (Beier KT et al 2015, Beier KT et al 2019). It has been reported significant discrepancies exist between the anatomical labels on the FP atlas and the Allen Brain Atlas (ref: Chon U et al., Nat Commun 2019). Although a summary of conversion is provided as a sheet, the authors need to describe how consistent or different the brain boundaries they defined in the manuscript with Allen Brain Atlas by adding histology images. Also, I wonder how reliable the annotations were for over a hundred of animals with manual quantification. The authors should briefly explain it rather than citing previous studies in the Material and Methods Section.

      We thank the reviewer for attention to this point; indeed, neuroanatomical detail is often overlooked in modern neuroscience, occasionally leading to spurious conclusions. We acknowledge that there are significant discrepancies in brain region definitions across atlases, which can make cross-study comparisons difficult. Here, all cells were manually quantified by Dr. Kevin Beier, as in previous studies (Beier et al., Cell 2015; Nature 2017; Cell Reports 2019; Tian et al., Cell Reports 2022; Tian et al., Neuron 2024; Hubbard et al., Neuropsychopharmacology, 2025). As such, these studies are internally consistent as relates to the definition of brain regions, which is critical here since our analysis in this manuscript relates to data quantified only by a single individual. Several brain regions were quite easy to distinguish anatomically, such as the medial habenula and lateral habenula. Others, such as the extended amygdala area, are much more difficult. We have now provided example images in Figure S1 that detail the anatomical boundaries that we used, overlayed on images of Neurotrace blue (fluorescent Nissl stain).

      (2) Regarding the ellipsoids in the PC, although it's written in the manuscript that "Ellipsoids were centered at the average coordinate of a condition and stretched one standard deviation along the primary and secondary axes", it's intuitively hard to understand in some figures such as Figure 2O, P and Figure S1. The authors need to make their data analysis methods more accessible by providing source code to the public.

      The source code is now available to the public at https://github.com/ktbartas/Bartas_et_al_eLife_2024, which is noted in the Code Availability statement. The code for generating ellipsoids is in the first notebook, `0-dataexploration-master-euclidean.ipynb`, in the function `confidence_ellipse`, which is called from `make_pca_plots` and `umap_and_heatmap`. Example plots are all live in the notebooks as can be viewed directly from GitHub.

      (3) In histology images (Figure 1B and 3K), the authors need to add dashed lines or arrows to guide the reader's attention.

      Dashed lines have been added to these figure panels as requested.

      (4) In Figure 2A and G, apparently there are significant differences in other brain regions such as NAcMed or PBN. If they are also statistically significant, the authors should note them as well and draw asterisks(*).

      We appreciate the care in ensuring that statistics are being applied and shown appropriately. In panel A (now Figure 3A), the Two-way ANOVA interaction term was not significant (p = 0.9365), we did not find it justified to do further comparisons. However, for Figure 3G, the interaction term was significant (p = 0.0001), and thus further pairwise comparisons were performed with Sidak's correction for multiple comparisons. When done, the only two brain regions that were significantly different were the DStr (p = 0.0051) and GPe (p = 0.0036). While the NAcMed and PBN visually look different, according to the corrected statistics, they were not significantly different (NAcMed p = 0.5037, PBN p = 0.8123). The notations in our original figure thus accurately reflected these statistics.

      (5) In Figure 2N about the spatial distribution of starter cells, the authors need to add histology images for each experimental condition (i.e. saline, fluoxetine, cocaine, methamphetamine, amphetamine, nicotine, and morphine) as supplement figures

      We have now provided these as Figure S2.

      (6) In the manuscript, it is necessary to explain why Cacna1e was selected among other calcium ion channels.

      We have added a sentence to the "Functional validation of link between gene expression and RABV labeling" section (lines 722-724).

      Reviewer #2 (Public review):

      The application of rabies virus (RabV)-mediated transsynaptic tracing has been widely utilized for mapping celltype-specific neural connectivities and examining potential modifications in response to biological phenomena or pharmacological interventions. Despite the predominant focus of studies on quantifying and analyzing labeling patterns within individual brain regions based on labeling abundance, such an approach may inadvertently overlook systemic alterations. There exists a considerable opportunity to integrate RabV tracing data with the global connectivity patterns and the transcriptomic signatures of labeled brain regions. In the present study, the authors take an important step towards achieving these objectives. Specifically, the authors conducted an intensive reanalysis of a previously generated large dataset of RabV tracing to the ventral tegmental area (VTA) using dimension reduction methods such as PCA and UMPA. This reaffirmed the authors' earlier conclusion that different cell types in the VTA, namely dopamine neurons (DA) and GABAergic neurons, exhibit quantitatively distinct input patterns, and a single dose of addictive drugs, such as cocaine and morphine, induced altered labeling patterns. Additionally, the authors illustrate that distinct axes of PCA can discriminate experimental variations, such as minor differences in the injection site of viral tracers, from bona fide alternations in labeling patterns caused by drugs of abuse. While the specific mechanisms underlying altered labeling in most brain regions remain unclear, whether involving synaptic strength, synaptic numbers, pre-synaptic activities, or other factors, the present study underscores the efficacy of an informatics approach in extracting more comprehensive information from the RabV-based circuit mapping data. Moreover, the authors showcased the utility of their previously devised bulk gene expression patterns inferred by the Allen Gene Expression Atlas (AGEA) and "projection portrait" derived from bulk axon mapping data sourced from the Allen Mouse Brain Connectivity Atlas. The utilization of such bulk data rests upon several limitations. For instance, the collection of axon mapping data involves an arbitrary selection of both cell type-specific and non-specific data, which might overlook crucial presynaptic partners, and often includes contamination from neighboring undesired brain regions. Concerns arise regarding the quantitativeness of AGEA, which may also include the potential oversight of key presynaptic partners. Nevertheless, the authors conscientiously acknowledged these potential limitations associated with the dataset. Notably, building on the observation of a positive correlation between the basal expression levels of Ca2+ channels and the extent of drug-induced changes in RabV labeling patterns, the authors conducted a CRISPRi-based knockdown of a single Ca2+ channel gene. This intervention resulted in a reduction of RabV labeling, supporting that the observed gene expression patterns have causality in RabV labeling efficiency. While a more nuanced discussion is necessary for interpreting this result (see below), overall I commend the authors for their efforts to leverage the existing dataset in a more meaningful way. This endeavor has the potential to contribute significantly to our understanding of the mechanisms underlying alterations in RabV labeling induced by drugs of abuse. Finally, drawing upon the aforementioned reanalysis of previous data, the authors underscored that a single administration of ketamine/xylazine anesthesia could induce enduring modifications in RabV labeling patterns for VTA DA neurons, specifically those projecting to the nucleus accumbens and amygdala. Given the potential impact of such alterations on motivational behaviors at a broader level, I fully agree that prudent consideration is warranted when employing ketamine/xylazine for the investigation of motivational behaviors in mice.

      Specific Points:

      (1) Beyond advancements in bioinformatics, readers may find it insightful to explore whether the PCA/UMPAbased approach yields novel biological insights. For example, the authors are encouraged to discuss more functional implications of PBN and LH in the context of drugs of abuse, as their labeling abundance could elucidate the PC2 axis in Fig. 2M.

      Thank you for this suggestion: we added text (Lines 787-795) discussing the LH and PBN (and GPe) specifically, but also highlighted the importance of our approach in hypothesis-generating science.

      (2) While I appreciate the experimental data on Cacna1e knockdown, I am unclear about the rationale behind specifically focusing on Cacna1e. The logic behind the statement, "This means that expression of this gene is not inhibitory towards RABV transmission," is also unclear. Loss-of-function experiments only signify the necessity or permissive functions of a gene. In this context, Cacna1e expression levels are required for efficient RabV labeling, but this neither supports nor excludes the possibility that this gene expression instructively suppresses RabV labeling/transmission, which could be assessed through gain-of-function experiments.

      We thank the reviewer for their suggestions regarding this result, and agree that a gain-of-function would be required to provide clearer evidence on this point.  We therefore understand that our original phrasing may be misleading. Thus, we have edited this section to the more conservative statement: “These results indicate that reduced levels of Cacna1e likely lower the number of RABV-labeled inputs from the NAcLat, and directly link the levels of Cacna1e and RABV input labeling” (lines 742-744) - we refrain from over-interpreting the results. As mentioned above in response to R1, we added a sentence to explain the rationale behind focusing on Cacna1e (lines 722-724).

      Reviewer #3 (Public Review):

      Summary:

      Authors mapped monosynaptic inputs to dopamine, GABA, and glutamate neurons in VTA under different anesthesia methods, and under drugs (cocaine, morphine, methamphetamine, amphetamine, nicotine, fluoxetine). They found that input patterns under different conditions are separated, and identified some key brain areas to contribute to such separation. They also searched a database for gene expression patterns that are common across input brain areas with some changes by anesthesia or drug administration.

      Strengths:

      The whole-brain approach to address drug effects is appealing and their conclusion is clear. The methodology and motivation are clearly explained.

      Weaknesses:

      While gene expression analyses may not be related to their findings on the anatomical effects of drugs, this will be a nice starting point for follow-up studies. 

      We understand and agree with the suggestion that gene expression allows us to provide correlative observations between in situ hybridization datasets and rabies mapping datasets, and that these results do not show causality. As such, future studies would be needed to assess this in more detail. We have added a line in the discussion to this effect (lines 851-853).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Recommendations for improving the writing and presentation:

      (1) There are a couple of packages available for 3D whole-brain reconstructions based on Allen Brain Atlas (eg. https://github.com/tractatus/wholebrain, https://github.com/lahammond/BrainJ), which would be helpful to align with the gene expression or other data from Allen Institute.

      This comment is related to the noted weakness we responded to previously in this rebuttal also from R1 (see comment 1), about the discrepancies between the Franklin-Paxinos atlas and Allen Brain atlas. We agree that a systematic comparison of these two atlases using a tool like wholebrain or BrainJ would be valuable for the field. However, it would be a substantial amount of work, and likely would be an independent study in itself. We believe that the resolution of these atlases was sufficient to make our key conclusions here (e.g., identify gene expression patterns that relate to drug-induced changes rabies virus labeling patterns, and develop a testable hypothesis for CRISPR-based gene editing). They are also based on the same atlases and region definitions that have been applied in our previous studies (e.g., Beier et al., Cell 2015; Beier et al., Nature 2017; Beier et al., Cell Reports 2019; Tian et al., Cell Reports 2022; Tian et al., Neuron 2024; Hubbard et al., Neuropsychophamacology 2025, etc.)  The expression of Cacna1e is relatively consistent across the NAc, as we have now detailed in Figure S13.

      (2) There are so far two kinds of rabies virus strains available in the neuroscience field (SAD-B19 or CVS-N2c). It is recommended to describe which strain was used in the Material and Methods Section because labeling efficiency and toxicity is quite different between the strains (Reardon TR et al., Neuron 2016).

      We have now noted that we used SAD B19 for all experiments (Lines 141-142).

      Minor corrections to the text and figures:

      (1)  In Figure 1A, the color differences are not clear (i.e. light gray and dark gray). The figure can be simplified.

      In addition, generally, images/figures are recommended not to be overlapped with other figures/images (Figures 2A-F, 2G-L).

      (2)  In Figures 7C and D, the authors could add enlarged views of starter cells in VTA and NAcLat.

      We have attempted to simplify schematics and figures throughout. High-magnification images of cells have been added as insets in what is now Figure 10 (formerly Figure 7).

      Reviewer #2 (Recommendations For the authors):

      The number of animals for each graph should be explicated within the figure legend. For example, Figure 1C and Figure 7E lack this information. It is also advisable to delineate the definition of error bars within the figure legend.

      We have now added mouse numbers to all figures and/or legends, as appropriate. We also indicated in the legend at the end of Figure 1 how error bars and asterisks are defined. Furthermore, we added a sentence to the methods saying that in UMAP and PCA plots each dot is an animal (lines 244-245).

      The visual representations, particularly in Figures 1 and 3, are overcrowding. Furthermore, the arrangement of figure subpanels does not consistently adhere to the sequence of explication in the main text, significantly compromising the readability of the text. The authors are encouraged to consider the possibility of segmenting dense figures into two if there exists no upper limit for the number of figure displays. To illustrate, in Figure 3Q, crucial details about experimental conditions are denoted by numerical references, owing to spatial constraints.

      We agree that the figure layout and mis-alignment with a linear read of the text was unideal. Therefore, we broke our figures, especially the original Figures 1-4, into multiple sub-figures, including both main and supplemental figures. This facilitated the use of space to rearrange the figure panels, allowing the story to be told in a linear fashion. All figures and panels should now be read in order.

      I am seeking clarification on how to interpret the term "overlap" at the bottom of figures illustrating Gene Ontology analysis.

      We have clarified the meaning of overlap in this context (lines 324-325): The ‘overlap’ term on the x-axis of these plots means the number of genes in the correlated gene lists that were also within the list of genes for the corresponding GO term.

      The authors could provide Cacna1e gene expression patterns within the NAc from the AGEA data.

      Cacna1e expression data are now provided in Figure S13.

      Additionally, the meaning of "controls" in Figure 7F, along with the "No gRNA" condition, remains ambiguous. While the text mentions "no shRNA", the involvement of shRNA in this experiment lacks clarity.

      We now clarify that the control conditions are based on previously published data where no AAVs were injected into NAcLat. This is now clarified in the legend for Figure 10F (lines 1277-1578). We also corrected “shRNA” to “gRNA” in the text.

    1. The most important thing to keep in mind here is that Meta’s encryption happens on the client application, the one you run on your phone. If the claims in this lawsuit are true, then Meta would have to alter the WhatsApp application so that plaintext (unencrypted) data would be uploaded from your app’s message database to some infrastructure at Meta, or else the keys would. And this should not be some rare, occasional glitch. The allegations in the lawsuit state that this applied to nearly all users, and for every message ever sent by those users since they signed up. Those constraints would tend to make this a very detectable problem. Even if WhatsApp’s app source code is not public, many historical versions of the compiled app are available for download. You can pull one down right now and decompile it using various tools, to see if your data or keys are being exfiltrated. I freely acknowledge that this is a big project that requires specialized expertise — you will not finish it by yourself in a weekend (as commenters on HN have politely pointed out to me.) Still, reverse-engineering WhatsApp’s client code is entirely possible and various parts of the app have indeed been reversed several times by various security researchers. The answer really is knowable, and if there is a crime, then the evidence is almost certainly* right there in the code that we’re all running on our phones.

      If the claim is correct, one could reverse engineer the app to see if true. Not a low hurdle but possible. 'the answer is knowable'

    2. In the case of WhatsApp, the application software is written by a team inside of Meta. This wouldn’t necessarily be a bad thing if the code was open source, and outside experts could review the implementation. Unfortunately WhatsApp is closed-source, which means that you cannot easily download the source code to see if encryption performed correctly, or performed at all. Nor can you compile your own copy of the WhatsApp app and compare it to the version you download from the Play or App Store. (This is not a crazy thing to hope for: you actually can do those things with open-source apps like Signal.)

      WhatsApp being closed source cannot be proven to work as advertised by outsiders. Unlike Signal

    1. Reviewer #2 (Public review):

      Summary:

      Tan et al. examined how multivoxel patterns shift in time windows surrounding event boundaries caused by both prediction errors and prediction uncertainty. They observed that some regions of the brain show earlier pattern shifts than others, followed by periods of increased stability. The authors combine their recent computational model to estimate event boundaries that are based on prediction error vs. uncertainty and use this to examine the moment-to-moment dynamics of pattern changes. I believe this is a meaningful contribution that will be of interest to memory, attention, and complex cognition research.

      Strengths:

      The authors have shown exceptional transparency in terms of sharing their data, code, and stimuli which is beneficial to the field for future examinations and to the reproduction of findings. The manuscript is well written with clear figures. The study starts from a strong theoretical background to understand how the brain represents events and have used a well-curated set of stimuli. Overall, the authors extend the event segmentation theory beyond prediction error to include prediction uncertainty which is an important theoretical shift that has implications in episodic memory encoding, use of semantic and schematic knowledge and to attentional processing.

      Weaknesses:

      (1) I am not fully satisfied with the author's explanation of pattern shifts occurring 11.9s prior to event boundaries. The average length of time for an event was 21.4 seconds. The window around the identified event boundaries was 20 seconds on either side. The earliest identified pattern shift peaks occur at 11.9s prior to the actual event boundary. This would mean on average, a pattern shift is occurring approximately at the midway point of the event (11.9s prior to a boundary of a 21.4s event is approx. the middle of an event). The authors offer up an explanation in which top down regions signal an update that propagates to lower order regions closer to the boundary. To make this interpretation concrete, they added an example: "in a narrative where a goal is reached midway-for instance, a mystery solved before the story formally ends-higher-order regions may update the event representation at that point, and this updated model then cascades down to shape processing in lower-level regions". This might make sense in a one-off case of irregular storytelling, but it is odd to think this would generalize. If an event is occurring and a given collection of regions represent that event, it doesn't follow the accepted convention of multivariate representational analysis that that set of regions would undergo such a large shift in patterns in the middle of an event. The stabilization of these patterns taking so long is also odd to me. I suspect some of these findings may be due to the stimuli used in this experiment and I am not confident this would generalize and invite the authors to disagree and explain. In the case of the exercise routine video, I try to imagine going from the push-up event to the jumping jack event. The actor stops doing pushups, stands up, and moves minimally for 16 seconds (these lulls are not uncommon). At that point they start doing jumping jacks. It is immediately evident from that moment on that jumping jacks will be the kind of event you are perceiving which may explain the long delay in event pattern stabilisation. Then about 11.9s prior to the end of the event, when the person is still performing jumping jacks (at this point they have been performing jumping jacks for 6 seconds), I would expect the brain to still be expecting this " jumping jacks event". For some reason at this point multivariate patterns in higher order regions shift. I do not understand what kind of top down processing is happening here and the reviewers need to be more concrete in their explanation because as of right now it is ill-defined. I also recognize that being specific to jumping jacks is maybe unfair, but this would apply to the push-ups, granola bar eating, or table cleaning events in the same manner. I suspect one possibility is that the participants realize that the stereotyped action of jumping jacks is going to continue and, thus, mindwander to other thoughts while waiting for novel, informative information to be presented. This explanation would challenge the more active top down processing assumed by the authors.

      I had provided a set of concerns to the authors that were not part of the public review and were not addressed. I was unaware of the exact format of the eLife approach, but I think they are worth open discussion so I am adding them here for consideration. Apologies for any confusion.

      (2) Why did the authors not examine event boundary activity magnitude differences from the uncertainty vs error boundaries? I see that the authors have provided the data on the openneuro. However, it seems like the difference in activity maps would not only provide extra contextualization of the findings, but also be fairly trivial. Just by eye-balling the plots, it appears as though there may be activity differences in the mPFC occurring shortly after a boundary between the two. Given this regions role in prediction error and schema, it would be important to understand whether this difference is merely due to thresholding effects or is statistically meaningful.

      (3) Further, the authors omitted all subcortical regions some of which would be especially interesting such as the hippocampus, basal ganglia, ventral tegmental area. These regions have a rich and deep background in event boundary activity, and prediction error. Univariate effects in these regions may provide interesting effects that might contextualize some of the pattern shifts in the cortex.

      (3) I see that field maps were collected, but the fmriprep methods state that susceptibility distortion correction was not performed. Is there a reason to omit this?

      (4) How many events were present in the stimuli?

    2. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper investigates the control signals that drive event model updating during continuous experience. The authors apply predictions from previously published computational models to fMRI data acquired while participants watched naturalistic video stimuli. They first examine the time course of BOLD pattern changes around human-annotated event boundaries, revealing pattern changes preceding the boundary in anterior temporal and then parietal regions, followed by pattern stabilization across many regions. The authors then analyze time courses around boundaries generated by a model that updates event models based on prediction error and another that uses prediction uncertainty. These analyses reveal overlapping but partially distinct dynamics for each boundary type, suggesting that both signals may contribute to event segmentation processes in the brain.

      Strengths:

      (1) The question addressed by this paper is of high interest to researchers working on event cognition, perception, and memory. There has been considerable debate about what kinds of signals drive event boundaries, and this paper directly engages with that debate by comparing prediction error and prediction uncertainty as candidate control signals.

      (2) The authors use computational models that explain significant variance in human boundary judgments, and they report the variance explained clearly in the paper.

      (3) The authors' method of using computational models to generate predictions about when event model updating should occur is a valuable mechanistic alternative to methods like HMM or GSBS, which are data-driven.

      (4) The paper utilizes an analysis framework that characterizes how multivariate BOLD pattern dissimilarity evolves before and after boundaries. This approach offers an advance over previous work focused on just the boundary or post-boundary points.

      We appreciate this reviewer’s recognition of the significance of this research problem, and of the value of the approach taken by this paper.

      Weaknesses:

      (1) While the paper raises the possibility that both prediction error and uncertainty could serve as control signals, it does not offer a strong theoretical rationale for why the brain would benefit from multiple (empirically correlated) signals. What distinct advantages do these signals provide? This may be discussed in the authors' prior modeling work, but is left too implicit in this paper.

      We added a brief discussion in the introduction highlighting the complementary advantages of prediction error and prediction uncertainty, and cited prior theoretical work that elaborates on this point. Specifically, we now note that prediction error can act as a reactive trigger, signaling when the current event model is no longer sufficient (Zacks et al., 2007). In contrast, prediction uncertainty is framed as proactive, allowing the system to prepare for upcoming changes even before they occur (Baldwin & Kosie, 2021; Kuperberg, 2021). Together, this makes clearer why these two signals could each provide complementary benefits for effective event model updating.

      "One potential signal to control event model updating is prediction error—the difference between the system’s prediction and what actually occurs. A transient increase in prediction error is a valid indicator that the current model no longer adequately captures the current activity. Event Segmentation Theory (EST; Zacks et al., 2007) proposes that event models are updated when prediction error increases beyond a threshold, indicating that the current model no longer adequately captures ongoing activity. A related but computationally distinct proposal is that prediction uncertainty (also termed "unpredictability") can serve as a control signal (Baldwin & Kosie, 2021). The advantage of relying on prediction uncertainty to detect event boundaries is that it is inherently proactive: the cognitive system can start looking for cues about what might come next before the next event starts (Baldwin & Kosie, 2021; Kuperberg, 2021). "

      (2) Boundaries derived from prediction error and uncertainty are correlated for the naturalistic stimuli. This raises some concerns about how well their distinct contributions to brain activity can be separated. The authors should consider whether they can leverage timepoints where the models make different predictions to make a stronger case for brain regions that are responsive to one vs the other.

      We addressed this concern by adding an analysis that explicitly tests the unique contributions of prediction error– and prediction uncertainty–driven boundaries to neural pattern shifts. In the revised manuscript, we describe how we fit a combined FIR model that included both boundary types as predictors and then compared this model against versions with only one predictor. This allowed us to identify the variance explained by each boundary type over and above the other. The results revealed two partially dissociable sets of brain regions sensitive to error- versus uncertainty-driven boundaries (see Figure S1), strengthening our argument that these signals make distinct contributions.

      "To account for the correlation between uncertainty-driven boundaries and error-driven boundaries, we also fitted a FIR model that predicted pattern dissimilarity from both types of boundaries (combined FIR) for each parcel. Then, we performed two likelihood ratio tests: combined FIR to error FIR, which measures the unique contribution of uncertainty boundaries to pattern dissimilarity, and combined FIR to uncertainty FIR, which measures the unique contribution of error boundaries to pattern dissimilarity. The analysis also revealed two dissociable sets of brain regions associated with each boundary type (see Figure S1)."

      (3) The authors refer to a baseline measure of pattern dissimilarity, which their dissimilarity measure of interest is relative to, but it's not clear how this baseline is computed. Since the interpretation of increases or decreases in dissimilarity depends on this reference point, more clarity is needed.

      We clarified how the FIR baseline is estimated in the methods section. Specifically, we now explain that the FIR coefficients should be interpreted relative to a reference level, which reflects the expected dissimilarity when timepoints are far from an event boundary. This makes it clear what serves as the comparison point for observed increases or decreases in dissimilarity.

      "The coefficients from the FIR model indicate changes relative to baseline, which can be conceptualized as the expected value when far from event boundaries."

      (4) The authors report an average event length of ~20 seconds, and they also look at +20 and -20 seconds around each event boundary. Thus, it's unclear how often pre- and post-boundary timepoints are part of adjacent events. This complicates the interpretations of the reported time courses.

      This is related to reviewer's 2 comment, and it will be addressed below.

      (5) The authors describe a sequence of neural pattern shifts during each type of boundary, but offer little setup of what pattern shifts we might expect or why. They also offer little discussion of what cognitive processes these shifts might reflect. The paper would benefit from a more thorough setup for the neural results and a discussion that comments on how the results inform our understanding of what these brain regions contribute to event models.

      We thank the reviewer for this advice on how better to set the context for the different potential outcomes of the study. We expanded both the introduction and discussion to better set up expectations for neural pattern shifts and to interpret what these shifts may reflect. In the introduction, we now describe prior findings showing that sensory regions tend to update more quickly than higher-order multimodal regions (Baldassano et al., 2017; Geerligs et al., 2021, 2022), and we highlight that it remains unclear whether higher-order updates precede or follow those in lower-order regions. We also note that our analytic approach is well-suited to address this open question. In the discussion, we then interpret our results in light of this framework. Specifically, we describe how we observed early shifts in higher-order areas such as anterior temporal and prefrontal cortex, followed by shifts in parietal and dorsal attention regions closer to event boundaries. This pattern runs counter to the traditional bottom-up temporal hierarchy view and instead supports a model of top-down updating, where high-level representations are updated first and subsequently influence lower-level processing (Friston, 2005; Kuperberg, 2021). To make this interpretation concrete, we added an example: in a narrative where a goal is reached midway—for instance, a mystery solved before the story formally ends—higher-order regions may update the event representation at that point, and this updated model then cascades down to shape processing in lower-level regions. Finally, we note that the widespread stabilization of neural patterns after boundaries may signal the establishment of a new event model.

      Excerpt from Introduction:

      “More recently, multivariate approaches have provided insights into neural representations during event segmentation. One prominent approach uses hidden Markov models (HMMs) to detect moments when the brain switches from one stable activity pattern to another (Baldassano et al., 2017) during movie viewing; these periods of relative stability were referred to as "neural states" to distinguish them from subjectively perceived events. Sensory regions like visual and auditory cortex showed faster transitions between neural states. Multi-modal regions like the posterior medial cortex, angular gyrus, and intraparietal sulcus showed slower neural state shifts, and these shifts aligned with subjectively reported event boundaries. Geerligs et al. (2021, 2022) employed a different analytical approach called Greedy State Boundary Search (GSBS) to identify neural state boundaries. Their findings echoed the HMM results: short-lived neural states were observed in early sensory areas (visual, auditory, and somatosensory cortex), while longer-lasting states appeared in multi-modal regions, including the angular gyrus, posterior middle/inferior temporal cortex, precuneus, anterior temporal pole, and anterior insula. Particularly prolonged states were found in higher-order regions such as lateral and medial prefrontal cortex.

      The previous evidence about evoked responses at event boundaries indicates that these are dynamic phenomena evolving over many seconds, with different brain areas showing different dynamics (Ben-Yakov & Henson, 2018; Burunat et al., 2024; Kurby & Zacks, 2018; Speer et al., 2007; Zacks, 2010). Less is known about the dynamics of pattern shifts at event boundaries (e.g. whether shifts observed in higher-order regions precedes or follow shifts observed in lower-level regions), because the HMM and GSBS analysis methods do not directly provide moment-by-moment measures of pattern shifts. Both the spatial and temporal aspects of evoked responses and pattern shifts at event boundaries have the potential to provide evidence about two potential control processes (error-driven and uncertainty-driven) for event model updating.”

      Excerpt from Discussion:

      “We first characterized the neural signatures of human event segmentation by examining both univariate activity changes and multivariate pattern changes around subjectively identified event boundaries. Using multivariate pattern dissimilarity, we observed a structured progression of neural reconfiguration surrounding human-identified event boundaries. The largest pattern shifts were observed near event boundaries (~4.5s before) in dorsal attention and parietal regions; these correspond with regions identified by Geerligs et. al as shifting their patterns on a fast to intermediate timescale (2022). We also observed smaller pattern shifts roughly 12 seconds prior to event boundaries in higher-order regions within anterior temporal cortex and prefrontal cortex, and these are slow-changing regions identified by Geerligs et. al (2022). This is puzzling. One prevalent proposal, based on the idea of a cortical hierarchy of increasing temporal receptive windows (TRWs), suggests that higher-order regions should update representations after lower-order regions do (Chang et al., 2021). In this view, areas with shorter TRWs (e.g., word-level processors) pass information upward, where it is integrated into progressively larger narrative units (phrases, sentences, events). This proposal predicts neural shifts in higher-order regions to follow those in lower-order regions. By contrast, our findings indicate the opposite sequence. Our findings suggest that the brain might engage in top-down event representation updating, with changes in coarser-grain representations propagating downward to influence finer-grain representations. (Friston, 2005; Kuperberg, 2021). For example, in a narrative where the main goal is achieved midway—such as a detective solving a mystery before the story formally ends—higher-order regions might update the overarching event representation at that point, and this updated model could then cascade down to reconfigure how lower-level regions process the remaining sensory and contextual details. In the period after a boundary (around +12 seconds), we found widespread stabilization of neural patterns across the brain, suggesting the establishment of a new event model. Future work could focus on understanding the mechanisms behind the temporal progression of neural pattern changes around event boundaries.”

      Reviewer #2 (Public review):

      Summary:

      Tan et al. examined how multivoxel patterns shift in time windows surrounding event boundaries caused by both prediction errors and prediction uncertainty. They observed that some regions of the brain show earlier pattern shifts than others, followed by periods of increased stability. The authors combine their recent computational model to estimate event boundaries that are based on prediction error vs. uncertainty and use this to examine the moment-to-moment dynamics of pattern changes. I believe this is a meaningful contribution that will be of interest to memory, attention, and complex cognition research.

      Strengths:

      The authors have shown exceptional transparency in terms of sharing their data, code, and stimuli, which is beneficial to the field for future examinations and to the reproduction of findings. The manuscript is well written with clear figures. The study starts from a strong theoretical background to understand how the brain represents events and has used a well-curated set of stimuli. Overall, the authors extend the event segmentation theory beyond prediction error to include prediction uncertainty, which is an important theoretical shift that has implications in episodic memory encoding, the use of semantic and schematic knowledge, and attentional processing.

      We thank the reader for their support for our use of open science practices, and for their appreciation of the importance of incorporating prediction uncertainty into models of event comprehension.

      Weaknesses:

      The data presented is limited to the cortex, and subcortical contributions would be interesting to explore. Further, the temporal window around event boundaries of 20 seconds is approximately the length of the average event (21.4 seconds), and many of the observed pattern effects occur relatively distal from event boundaries themselves, which makes the link to the theoretical background challenging. Finally, while multivariate pattern shifts were examined at event boundaries related to either prediction error or prediction uncertainty, there was no exploration of univariate activity differences between these two different types of boundaries, which would be valuable.

      The fact that we observed neural pattern shifts well before boundaries was indeed unexpected, and we now offer a more extensive interpretation in the discussion section. Specifically, we added text noting that shifts emerged in higher-order anterior temporal and prefrontal regions roughly 12 seconds before boundaries, whereas shifts occurred in lower-level dorsal attention and parietal regions closer to boundaries. This sequence contrasts with the traditional bottom-up temporal hierarchy view and instead suggests a possible top-down updating mechanism, in which higher-order representations reorganize first and propagate changes to lower-level areas (Friston, 2005; Kuperberg, 2021). (See excerpt for Reviewer 1’s comment #5.)

      With respect to univariate activity, we did not find strong differences between error-driven and uncertainty-driven boundaries. This makes the multivariate analyses particularly informative for detecting differences in neural pattern dynamics. To support further exploration, we have also shared the temporal progression of univariate BOLD responses on OpenNeuro (BOLD_coefficients_brain_animation_pe_SEM_bold.html and BOLD_coefficients_brain_animation_uncertainty_SEM_bold.html in the derivatives/figures/brain_maps_and_timecourses/ directory; https://doi.org/10.18112/openneuro.ds005551.v1.0.4) for interested researchers.

      Reviewer #3 (Public review):

      Summary:

      The aim of this study was to investigate the temporal progression of the neural response to event boundaries in relation to uncertainty and error. Specifically, the authors asked (1) how neural activity changes before and after event boundaries, (2) if uncertainty and error both contribute to explaining the occurrence of event boundaries, and (3) if uncertainty and error have unique contributions to explaining the temporal progression of neural activity.

      Strengths:

      One strength of this paper is that it builds on an already validated computational model. It relies on straightforward and interpretable analysis techniques to answer the main question, with a smart combination of pattern similarity metrics and FIR. This combination of methods may also be an inspiration to other researchers in the field working on similar questions. The paper is well written and easy to follow. The paper convincingly shows that (1) there is a temporal progression of neural activity change before and after an event boundary, and (2) event boundaries are predicted best by the combination of uncertainty and error signals.

      We thank the reviewer for their thoughtful and supportive comments, particularly regarding the use of the computational model and the analysis approaches.

      Weaknesses:

      (1) The current analysis of the neural data does not convincingly show that uncertainty and prediction error both contribute to the neural responses. As both terms are modelled in separate FIR models, it may be that the responses we see for both are mostly driven by shared variance. Given that the correlation between the two is very high (r=0.49), this seems likely. The strong overlap in the neural responses elicited by both, as shown in Figure 6, also suggests that what we see may mainly be shared variance. To improve the interpretability of these effects, I think it is essential to know whether uncertainty and error explain similar or unique parts of the variance. The observation that they have distinct temporal profiles is suggestive of some dissociation,but not as convincing as adding them both to a single model.

      We appreciate this point. It is closely related to Reviewer 1's comment 2; please refer to our response above.

      (2) The results for uncertainty and error show that uncertainty has strong effects before or at boundary onset, while error is related to more stabilization after boundary onset. This makes me wonder about the temporal contribution of each of these. Could it be the case that increases in uncertainty are early indicators of a boundary, and errors tend to occur later?

      We also share the intuition that increases in uncertainty are early indicators of a boundary, and errors tend to occur later. If that is the case, we would expect some lags between prediction uncertainty and prediction error. We examined lagged correlation between prediction uncertainty and prediction error, and the optimal lag is 0 for both uncertainty-driven and error-driven models. This indicates that when prediction uncertainty rises, prediction error also simultaneously rises.

      Author response image 1.

      (3) Given that there is a 24-second period during which the neural responses are shaped by event boundaries, it would be important to know more about the average distance between boundaries and the variability of this distance. This will help establish whether the FIR model can properly capture a return to baseline.

      We have added details about the distribution of event lengths. Specifically, we now report that the mean length of subjectively identified events was 21.4 seconds (median 22.2 s, SD 16.1 s). For model-derived boundaries, the average event lengths were 28.96 seconds for the uncertainty-driven model and 24.7 seconds for the error-driven model.

      " For each activity, a separate group of 30 participants had previously segmented each movie to identify fine-grained event boundaries (Bezdek et al., 2022). The mean event length was 21.4 s (median 22.2 s, SD 16.1 s). Mean event lengths for uncertainty-driven model and error-driven model were 28.96s, and 24.7s, respectively (Nguyen et al., 2024)."

      (4) Given that there is an early onset and long-lasting response of the brain to these event boundaries, I wonder what causes this. Is it the case that uncertainty or errors already increase at 12 seconds before the boundaries occur? Or if there are other makers in the movie that the brain can use to foreshadow an event boundary? And if uncertainty or errors do increase already 12 seconds before an event boundary, do you see a similar neural response at moments with similar levels of error or uncertainty, which are not followed by a boundary? This would reveal whether the neural activity patterns are specific to event boundaries or whether these are general markers of error and uncertainty.

      We appreciate this point; it is similar to reviewer 2’s comment 2. Please see our response to that comment above.

      (5) It is known that different brain regions have different delays of their BOLD response. Could these delays contribute to the propagation of the neural activity across different brain areas in this study?

      Our analyses use ±20 s FIR windows, and the key effects we report include shifts ~12s before boundaries in higher-order cortex and ~4.5s pre-boundary in dorsal attention/parietal areas. Given the literature above, region-dependent BOLD delays are much smaller (~1–2s) than the temporal structure we observe (Taylor et al., 2018), making it unlikely that HRF lag alone explains our multi-second, region-specific progression.

      (6) In the FIR plots, timepoints -12, 0, and 12 are shown. These long intervals preclude an understanding of the full temporal progression of these effects.

      For page length purposes, we did not include all timepoints. We uploaded a brain animation of all timepoints and coefficients for each parcel in Openneuro (PATTERN_coefficients_brain_animation_human_fine_pattern.html and PATTERN_coefficients_lines_human_fine.html in the derivatives/figures/brain_maps_and_timecourses/ directory; https://doi.org/10.18112/openneuro.ds005551.v1.0.4) for interested researchers.

      References

      Taylor, A. J., Kim, J. H., & Ress, D. (2018). Characterization of the hemodynamic response function across the majority of human cerebral cortex. NeuroImage, 173, 322–331. https://doi.org/10.1016/j.neuroimage.2018.02.061

    1. repository open issue .md .pdf Data From the Reddit API 8.2. Data From the Reddit API# When we’ve been accessing Reddit through Python and the “PRAW” code library. The praw code library works by sending requests across the internet to Reddit, using what is called an “application programming interface” or API for short. APIs have a set of rules for what requests you can make, what happens when you make the request, and what information you can get back. If you are interested in learning more about what you can do with praw and what information you can get back, you can look at the official documentation for those. But be warned they are not organized in a friendly way for newcomers and take some getting used to to figure out what these documentation pages are talking about. So, if you are interested, you can look at the praw library documentation to find out what the library can do (again, not organized in a beginner-friendly way). You can learn a little more by clicking on the praw models and finding a list of the types of data for each of the models, and a list of functions (i.e., actions) you can do with them. You can also look up information on the data that you can get from the Reddit API by looking at the Reddit API Documentation. The Reddit API lets you access just some of the data that Reddit tracks, but Reddit and other social media platforms track much more than they let you have access to.

      This section helped me better understand what the Reddit API actually is and how PRAW works behind the scenes. I didn’t really think about the fact that it’s just sending requests to Reddit and getting specific data back based on rules. The warning about the documentation being hard to read feels very accurate, because most official docs are kind of confusing for beginners. It was also interesting to realize that Reddit collects way more data than what the API lets us see.

    2. 8.2. Data From the Reddit API# When we’ve been accessing Reddit through Python and the “PRAW” code library. The praw code library works by sending requests across the internet to Reddit, using what is called an “application programming interface” or API for short. APIs have a set of rules for what requests you can make, what happens when you make the request, and what information you can get back. If you are interested in learning more about what you can do with praw and what information you can get back, you can look at the official documentation for those. But be warned they are not organized in a friendly way for newcomers and take some getting used to to figure out what these documentation pages are talking about. So, if you are interested, you can look at the praw library documentation to find out what the library can do (again, not organized in a beginner-friendly way). You can learn a little more by clicking on the praw models and finding a list of the types of data for each of the models, and a list of functions (i.e., actions) you can do with them. You can also look up information on the data that you can get from the Reddit API by looking at the Reddit API Documentation. The Reddit API lets you access just some of the data that Reddit tracks, but Reddit and other social media platforms track much more than they let you have access to.

      This section shows how powerful—and dangerous—data mining can be when patterns are taken out of context. The examples make it clear that just because data lines up does not mean it reveals a true cause, especially with spurious correlations. It highlights how easily data can be used to support misleading or biased conclusions, which is especially concerning when these inferences affect real people’s identities and social outcomes.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer 1

      Minor

      The main substance of my previous comment I suppose targeted a deeper issue - namely whether such a result is reflecting a resolution to a 'neural prediction' puzzle or a 'perceptual prediction' puzzle. Of course, these results tell us a great deal about a potential resolution for how dampening and sharpening might co-exist in the brain - but in the absence of corresponding perceptual effects (or a lack of correlation between neural and perceptual variables - as outlined in this revision) I do wonder if any claims about implications for perception might need moderation or caveating. To be honest, I don't think the authors *need* to make any more changes along these lines for this paper to be acceptable - it is more an issue they might wish to consider themselves when contextualizing their findings.

      Thank you for the thoughtful comment. We have now added a caveat to the relevant section of the discussion to make it clearer that we are discussing neural results, not perceptual results (p.20, lines 378-379).

      I am also happy with the changes that the authors have made justifying which claims can and cannot made based on a statistical decoding test against 'chance' in a single condition using t-tests. I was perhaps a little unclear when I spoke about 'comparisons against 0' in my original review, when the key issue (as the authors have intuited!) is about comparisons against 'chance' (where e.g., 0% decoding above chance is the same thing as 'chance'!). The authors are of course correct in the amendment they have made on p.29 to make clear this is a 'fixed effects analysis' - though I still worry this could be a little cryptic for the average reader. I am not suggesting that the authors run more analyses, or revise any conclusions, but I think it would be more transparent if a note was added along the lines of "while the fixed effects approach (one-sample t-test) enables us to establish whether some consistent informative patterns are detectable in these particular subjects, the results from our paired t-tests support inference to the wider population".

      This sentence has been added for increased transparency (p. 27, lines 544-547).

      Reviewer 3

      Major

      (1) In the previous round of comments, I noted that: "I am not fully convinced that Figures 3A/B and the associated results support the idea that early learning stages result in dampening and later stages in sharpening. The inference made requires, in my opinion, not only a significant effect in one-time bin and the absence of an effect in other bins. Instead to reliably make this inference one would need a contrast showing a difference in decoding accuracy between bins, or ideally an analysis not contingent on seemingly arbitrary binning of data, but a decrease (or increase) in the slope of the decoding accuracy across trials. Moreover, the decoding analyses seem to be at the edge of SNR, hence making any interpretation that depends on the absence of an effect in some bins yet more problematic and implausible". The authors responded: "we fitted a logarithmic model to quantify the change of the decoding benefit over trials, then found the trial index for which the change of the logarithmic fit was < 0.1%. Given the results of this analysis and to ensure a sufficient number of trials, we focused our further analyses on bins 1-2". However, I do not see how this new analysis addresses the concern that the conclusion highlights differences in decoding performance between bins 1 and 2, yet no contrast between these bins are performed. While I appreciate the addition of the new model, in my current understanding it does not solve the problem I raised. I still believe that if the authors wish to conclude that an effect differs between two bins they must contrast these directly and/or use a different appropriate analysis approach.

      Relatedly, the logarithmic model fitting and how it justifies the focus on analysis bin 1-2 needs to be explained better, especially the rationale of the analysis, the choice of parameters (e.g., why logarithmic, why change of logarithmic fit < 0.1% as criterion, etc), and why certain inferences follow from this analysis. Also, the reporting of the associated results seems rather sparse in the current iteration of the manuscript.

      We thank the reviewer for this important point. Following your suggestion, we conducted additional post-hoc tests directly comparing the first and second bins. We found significant differences between bins in the invalid trials, but not the valid trials, suggesting that sharpening/dampening effects are condition specific. This is discussed in the manuscript on p.14, lines 268-271; p.15, 280-284; p.20, lines 382-386.

      A logarithmic analysis was chosen as learning is usually found to be a nonlinear process; learning effects occur rapidly before stabilising relatively early, as seen in Fig. 2D. This is consistent with other research which found that logarithmic fits efficiently describe learning curves in statistical learning (Kang et al., 2023; Siegelman et al., 2018; Choi et al., 2020). By utilising a change of logarithmic fit at <0.1% as a criterion, it is ensured that virtually zero learning took place after that point, allowing us to focus our analysis on learning effects as they developed and providing a more accurate model of representational change. This is explained in the manuscript on p.13, lines 250-251; p.27-28, lines 557-563.

      (2) A critical point the authors raise is that they investigate the buildup of expectations during training. They go on to show that the dampening effect disappears quickly, concluding: "the decoding benefit of invalid predictions [...] disappeared after approximately 15 minutes (or 50 trials per condition)". Maybe the authors can correct me, but my best understanding is as follows: Each bin has 50 trials per condition. The 2:1 condition has 4 leading images, this would mean ~12 trials per leading stimulus, 25% of which are unexpected, so ~9 expected trials per pair. Bin 1 represents the first time the participants see the associations. Therefore, the conclusion is that participants learn the associations so rapidly that ~9 expected trials per pair suffice to not only learn the expectations (in a probabilistic context) but learn them sufficiently well such that they result in a significant decoding difference in that same bin. If so, this would seem surprisingly fast, given that participants learn by means of incidental statistical learning (i.e. they were not informed about the statistical regularities). I acknowledge that we do not know how quickly the dampening/sharpening effects develop, however surprising results should be accompanied with a critical evaluation and exceptionally strong evidence (see point 1). Consider for example the following alternative account to explain these results. Category pairs were fixed across and within participants,i.e. the same leading image categories always predicted the same trailing image categories for all participants. Some category pairings will necessarily result in a larger representational overlap (i.e., visual similarity, etc.) and hence differences in decoding accuracy due to adaptation and related effects. For example, house  barn will result in a different decoding performance compared to coffee cup  barn, simply due to the larger visual and semantic similarity between house and barn compared to coffee cup and barn. These effects should occur upon first stimulus presentation, independent of statistical learning, and may attenuate over time e.g., due to increasing familiarity with the categories (i.e., an overall attenuation leading to smaller between condition differences) or pairs.

      We apologise for the confusion, there are 50 expected trials per bin per condition. The trial breakdown is as follows. Each participant completed 1728 trials, split equally across 3 mappings (two 2:1 maps and one 1:2 map), giving 1152 trials in the 2:1 mapping. Stimuli were expected in 75% of trials (864), leaving 216 per bin, and 54 per leading image in each bin. We have clarified this in the script (p.14, line 267; p.15, line 280). This is in line with similar studies in the field (e.g. Han et al., 2019).

      (3) In response to my previous comment, why the authors think their study may have found different results compared to multiple previous studies (e.g. Han et al., 2019; Kumar et al., 2017; Meyer and Olson, 2011), particularly the sharpening to dampening switch, the authors emphasize the use of non-repeated stimuli (no repetition suppression and no familiarity confound) in their design. However, I fail to see how familiarity or RS could account for the absence of

      sharpening/dampening inversion in previous studies.

      First, if the authors argument is about stimulus novelty and familiarity as described by Feuerriegel et al., 2021, I believe this point does not apply to the cited studies. Feuerriegel et al., 2021 note: "Relative stimulus novelty can be an important confound in situations where expected stimulus identities are presented often within an experiment, but neutral or surprising stimuli are presented only rarely", which indeed is a critical confound. However, none of the studies (Han et al., 2019; Richter et al., 2018; Kumar et al., 2017; Meyer and Olson, 2011) contained this confound, because all stimuli served as expected and unexpected stimuli, with the expectation status solely determined by the preceding cue. Thus, participants were equally familiar with the images across expectation conditions.

      Second, for a similar reason the authors argument for RS accounting for the different results does not hold either in my opinion. Again, as Feuerriegel et al. 2021 correctly point out: "Adaptation-related effects can mimic ES when the expected stimuli are a repetition of the last-seen stimulus or have been encountered more recently than stimuli in neutral expectation conditions." However, it is critical to consider the precise design of previous studies. Taking again the example of Han et al., 2019; Kumar et al., 2017; Meyer and Olson, 2011. To my knowledge none of these studies contained manipulations that would result in a more frequent or recent repetition of any specific stimulus in the expected compared to unexpected condition. The crucial manipulation in all these previous studies is not that a single stimulus or stimulus feature (which could be subject to familiarity or RS) determines the expectation status, but rather the transitional probability (i.e. cue-stimulus pairing) of a particular stimulus given the cue. Therefore, unless I am missing something critical, simple RS seems unlikely to differ between expectation condition in the previous studies and hence seems implausible to account for differences in results compared to the current study.

      Moreover, studies cited by the authors (e.g. Todorovic & de Lange, 2012) showed that RS and ES are separable in time, again making me wonder how avoiding stimulus repetition should account for the difference in the present study compared to previous ones. I am happy to be corrected in my understanding, but with the currently provided arguments by the authors I do not see how RS and familiarity can account for the discrepancy in results.

      The reviewer is correct in that the studies cited (Han et al., 2019; Kumar et al., 2017; Meyer and Olson, 2011) ensure that participants are equally familiar with the images across expectation conditions. Where the present study differs is that participants are not familiar with individual exemplars at all. Han et al., 2019 used a pool of 30 individual images, and subjects underwent exposure sessions lasting two hours each daily for 34 days prior to testing. Kumar et al., 2017 used a pool of 12 images with subjects being exposed to each sequential pair 816 times over the course of the training period. Meyer & Olsen, 2011 used pure tones at five different pitch levels. While familiarity of stimuli across conditions was controlled for in these studies in the sense that familiarity was constant across conditions, novelty was not controlled for. The present study uses a pool of ~3500 images, which are unrepeated across trials.

      Feuerriegel et al., 2021 also points out: “There are also effects of adaptation that are dependent on the recent stimulation history extending beyond the last encountered stimulus and long-lag repetition effects that occur when the first and second presentation of a stimulus is separated by tens or even hundreds of intervening images”. Bearing this in mind, and given the very small pool of stimuli being used by Han et al., 2019; Kumar et al., 2017; Meyer and Olson, 2011, it stands to reason that these studies may still have built-in but unaccounted for effects relating to the repetition of exemplars. Thus, our avoidance of those possible confounds, in addition to foregoing any prior training, may elicit differing results. Furthermore, as pointed out by Walsh et al. 2020, methodological heterogeneity (such as subject training) can produce contrasting results as PP makes divergent predictions regarding the properties of prediction error given different permutations of variables such as training, transitional probabilities, and conditional probabilities. In our case, the use of differing methodology was intentional. These issues have been discussed in more detail on p.5, lines 112-115; p.19, lines 368-377; p.20, lines 378-379).

      Minor

      (1) The authors note in their reply to my previous questions that: "As mentioned above, we opted to target our ERP analyses on Oz due to controversies in the literature regarding univariate effects of ES (Feuerriegel et al., 2021)". This might be a lack of understanding on my side, but how are concerns about the reliability of ES, as outlined by Feuerriegel et al. (2021), an argument for restricting analyses to 1 EEG channel (Oz)? Could one not argue equally well that precisely because of these concerns we should be less selective and instead average across multiple (occipital) channels to improve the reliability of results?

      The reviewer is correct in suggesting that a cluster of occipital electrodes may be more reliable than reporting one single electrode. We have amended the analysis to examine electrodes Oz, O1, and O2 (p.9, lines 187-188; p.11, lines 197-201).

      (2) The authors provide a github link for the dataset and code. However, I doubt that github is a suitable location to share EEG data (which at present I also cannot find linked in the github repo). Do the authors plan to share the EEG data and if so where?

      Thank you for bringing this to my attention. EEG data has now been uploaded at osf.io/x7ydf and linked to the github repository (p.28, lines 569-570).

      (3) The figure text could benefit from additional information; e.g. Fig.1C and Fig.3 do not clarify what the asterisk indicates; p < ? with or without multiple comparison correction?

      Thank you for pointing out this oversight, the figure texts have been amended (p. 9, line 168; p.16, line 289).

    1. Reviewer #1 (Public review):

      The authors sought to investigate the role of adaptation in supporting object recognition. In particular, the extent to which adaptation to noise improves subsequent recognition of objects embedded in the same or similar noise, and how this interacts with target contrast. The authors approach this question using a combination of psychophysics, electroencephalography, and deep neural networks. They find better behavioural performance and multivariate decoding of stimuli preceded by noise, suggesting a beneficial effect of adaptation to noise. The neural network analysis seeks to provide a deeper explanation of the results by comparing how well different adaptation mechanisms capture the empirical behavioural results. The results show that models incorporating intrinsic adaptation mechanisms, such as additive suppression and divisive normalisation, capture the behavioural results better than those that incorporate recurrent interactions. The study has the potential to provide interesting insights into adaptation, but there are alternative (arguably more parsimonious) explanations for the results that have not been refuted (or even recognised) in the manuscript. If these confounds can be compellingly addressed, then I expect the results would be of interest to a broad range of readers.

      The study uses a multi-modal approach, which provides a rich characterisation of the phenomenon. The methods are described clearly, and the accompanying code and data are made publicly available. The comparison between univariate and multivariate analyses is interesting, and the application of neural networks to distinguish between different models of adaptation seems quite promising.

      There are several concerning confounding factors that need to be addressed before the results can be meaningfully interpreted. In particular, differences in behavioural accuracy may be explained by a simple change detection mechanism in the "same noise" condition, and temporal cuing by the "adaptor" stimulus may explain differences in reaction time. Similarly, interference between event-related potentials may explain the univariate EEG results, and biased decoder training may explain the multivariate results. Thus, it is currently unclear if any of the results reflect adaptation.

      My main concerns relate to how adaptation is induced and how differences between conditions are interpreted. The adaptation period is only 1.5 s. Although brief adaptors (~1 s) can produce stimulus history effects, it is unclear whether these reflect the same mechanisms as those observed with standard, longer adaptation durations (e.g., 10-30 s). Prior EEG work on visual adaptation using longer adaptors has shown that feature-specific effects emerge very early (<100 ms) after test onset in both univariate and multivariate responses (Rideaux et al., 2023, PNAS). In contrast, the present study finds no difference between same and different adaptor conditions until much later (>300 ms). These later effects likely reflect cognitive processes such as template matching or decision-making, rather than sensory adaptation. Although early differences appear between blank and adaptor conditions, these could be explained by interactions between ERPs elicited by adaptor onset/offset and those elicited by the test stimulus; therefore, they cannot be attributed to adaptation. This contradicts the statement in the Discussion that "Our EEG measurements show clear evidence of repetition suppression, in the form of reduced responses to the repeated noise pattern early in time."

      A second concern is the brief inter-stimulus interval. The adaptor is shown for 1.5 s, followed by only a 134 ms blank before the target. When the "adaptor" and test noise are identical, improved performance could simply arise from detecting the pixels that change, namely, those forming the target number. Such change detection does not require adaptation; even simple motion detector units would suffice. If the blank period were longer-beyond the temporal window of motion detectors-then improved performance would more convincingly reflect adaptation. Given the very short blank, however, a more parsimonious explanation for the behavioural effect in the same-noise condition is that change detection mechanisms isolate the target.

      Differences between the blank and adaptor conditions may also be explained by temporal cueing. In the noise conditions, the noise reliably signals the upcoming target time, whereas the blank condition provides no such cue. Given the variable inter-trial interval and the brief target presentation, this temporal cue would strongly facilitate target perception. This account is consistent with the reaction time results: both adaptor conditions produce faster reaction times than the blank condition, but do not differ from each other.

      The decoding analyses are also difficult to interpret, given the training-testing protocol. All trials from the three main conditions (blank, same, different) were used to train the classifier, and then held-out trials - all from one condition-were decoded. Because ERPs in the adaptor conditions differ substantially from those in the blank condition, and because there are twice as many adaptor trials, the classifier is biased toward patterns from the adaptor conditions and will naturally perform worse on blank trials. To compare decoding accuracy meaningfully across conditions, the classifier should be trained on a separate unbiased dataset (e.g., the "clean" data), or each condition should be trained and tested separately using cross-fold validation.

    1. eLife Assessment

      This study presents a valuable and well-documented computational pipeline for the scalable analysis and spike sorting of large extracellular electrophysiology datasets, with particular relevance for high-density recordings such as Neuropixels. The authors demonstrate the pipeline's utility for benchmarking spike sorter performance and evaluating the effects of data compression, supported by thorough testing, clear figures, and openly available code. The workflow is reproducible, portable, and practical, providing concrete guidance on computational cost and runtime. Overall, the evidence supporting the pipeline's performance and output quality is compelling, and this work will be of broad interest to the systems neuroscience community.

    2. Reviewer #1 (Public review):

      Summary:

      Extracellular electrophysiology datasets are growing in both number and size, and recordings with thousands of sites per animal are now commonplace. Analyzing these datasets to extract the activity of single neurons (spike sorting) is challenging: signal-to-noise is low, the analysis is computationally expensive, and small changes in analysis parameters and code can alter the output. The authors address the problem of volume by packaging the well-characterized SpikeInterface pipeline in a framework that can distribute individual sorting jobs across many workers in a compute cluster or cloud environment. Reproducibility is ensured by running containerized versions of the processing components.

      The authors apply the pipeline in two important examples. The first is a thorough study comparing the performance of two widely used spike-sorting algorithms (Kilosort 2.5 and Kilosort 4). They use hybrid datasets created by injecting measured spike waveforms (templates) into existing recordings, adjusting those waveforms according to the measured drift in the recording. These hybrid ground truth datasets preserve the complex noise and background of the original recording. Similar to the original Kilosort 4 paper, which uses a different method for creating ground truth datasets that include drift, the authors find Kilosort 4 significantly outperforms Kilosort 2.5. The second example measures the impact of compression of raw data on spike sorting with Kilosort 4, showing that accuracy, precision, and recall of the ground truth units are not significantly impacted even by lossy compression. As important as the individual results, these studies provide good models for measuring the impact of particular processing steps on the output of spike sorting.

      Strengths:

      The pipeline uses the Nextflow framework, which makes it adaptable to different job schedulers and environments. The high-level documentation is useful, and the GitHub code is well organized. The two example studies are thorough and well-designed, and address important questions in the analysis of extracellular electrophysiology data.

      Weaknesses:

      The pipeline is very complete, but also complex. Workflows - the optimal artifact removal, best curation for data from a particular brain area or species - will vary according to experiment. Therefore, a discussion of the adaptability of the pipeline in the "Limitations" section would be helpful for readers.

    3. Reviewer #2 (Public review):

      Summary:

      This work presents a reproducible, scalable workflow for spike sorting that leverages parallelization to handle large neural recording datasets. The authors introduce both a processing pipeline and a benchmarking framework that can run across different computing environments (workstations, HPC clusters, cloud). Key findings include demonstrating that Kilosort4 outperforms Kilosort2.5 and that 7× lossy compression has minimal impact on spike sorting performance while substantially reducing storage costs.

      Strengths:

      (1) Extremely high-quality figures with clear captions that effectively communicate complex workflow information.

      (2) Very detailed, well-written methods section providing thorough documentation.

      (3) Strong focus on reproducibility, scalability, modularity, and portability using established technologies (Nextflow, SpikeInterface, Code Ocean).

      (4) Pipeline publicly available on GitHub with documentation.

      (5) Clear cost analysis showing ~$5/hour for AWS processing with transparent breakdown.

      (6) Good overview of previous spike sorting benchmarking attempts in the introduction.

      (7) Practical value for the community by lowering barriers to processing large datasets.

      Weaknesses:

      No significant weaknesses were identified, although it is noted that the limitations section of the discussion could be expanded.

    1. Violences Institutionnelles : Analyse et Perspectives Juridiques et Pratiques

      Synthèse Exécutive

      Ce document de synthèse analyse les dimensions multiples des violences institutionnelles, en s'appuyant sur une expertise croisée du droit, des politiques publiques et de la recherche en sciences sociales.

      Il ressort que la notion de "violence institutionnelle" est complexe, marquée par une ambiguïté juridique persistante malgré des avancées législatives récentes.

      Le terme de "maltraitance institutionnelle" est souvent privilégié pour souligner la relation de pouvoir asymétrique inhérente entre l'institution et l'usager.

      Les points critiques à retenir sont les suivants :

      1. Une Définition Juridique Incomplète : La loi du 7 février 2022 a introduit dans le Code de l'Action Sociale et des Familles (art. L119-1) une définition de la maltraitance qui englobe l'origine institutionnelle.

      Cependant, elle ne définit pas spécifiquement ce que constitue la "maltraitance institutionnelle", laissant une marge d'interprétation et posant des défis en matière de qualification et de traitement.

      2. Un Phénomène Peu Quantifié : Il existe une carence significative de données statistiques publiques permettant de mesurer l'ampleur des violences institutionnelles en France.

      Les données disponibles indiquent toutefois une forte exposition des professionnels du secteur social et de la santé à la violence, à des niveaux comparables à ceux des forces de l'ordre, ce qui témoigne d'un climat de travail particulièrement difficile.

      3. Des Responsabilités Partagées : La lutte contre la maltraitance institutionnelle ne peut se limiter à la sanction des fautes individuelles.

      Elle engage des chaînes de responsabilité plurielles et complexes, impliquant les professionnels, les institutions, et plus largement la société dans sa capacité à définir des seuils de tolérance et à protéger les plus vulnérables.

      4. L'Importance Cruciale du Soutien Organisationnel : Une étude menée à la Ville de Paris révèle que le bien-être des professionnels du travail social n'est pas corrélé au nombre d'actes de violence subis, mais plutôt à la qualité du soutien organisationnel perçu.

      La "détresse morale", liée au manque de marges de manœuvre pour répondre adéquatement aux besoins des usagers, est également un facteur déterminant.

      Ces constats identifient le soutien aux équipes et le renforcement de l'autonomie professionnelle comme des leviers d'action stratégiques pour la prévention.

      1. Le Cadre Conceptuel et Juridique des Violences Institutionnelles

      1.1. Ambiguïtés Sémantiques : Violence vs. Maltraitance

      Une distinction fondamentale est établie entre les notions de "violence" et de "maltraitance".

      Alors que la violence peut survenir dans n'importe quel contexte, la maltraitance se caractérise par une relation asymétrique de pouvoir ou de dépendance entre l'auteur et la victime.

      Dans le contexte institutionnel, la victime se trouve dans une position d'infériorité dont il lui est difficile de s'extraire.

      Perspective de la recherche : La littérature scientifique suggère de privilégier le terme de "maltraitance institutionnelle", car elle implique une relation de pouvoir où la victime est en position d'infériorité, ce qui est particulièrement vrai pour les enfants relevant de l'aide sociale à l'enfance.

      Perspective des personnes concernées :

      Le plaidoyer d'ATD Quart Monde ("Stop à la maltraitance institutionnelle", septembre 2024) met en lumière le caractère systémique du phénomène et la forte exposition des personnes en situation de pauvreté.

      Une citation issue de ce travail illustre la dépendance de la personne vis-à-vis de l'institution :

      La maltraitance institutionnelle peut prendre deux formes :

      1. Une réalité factuelle et objectivable : Des actes pouvant constituer des infractions pénales (violences, négligences graves).

      2. Une réalité subjective : Le vécu ou le ressenti d'une personne qui s'estime victime, même en l'absence d'infraction pénale caractérisée.

      1.2. L'Évolution du Droit et des Politiques Publiques

      La reconnaissance des violences institutionnelles dans le droit et les politiques publiques a progressé par à-coups successifs.

      | Année | Événement Clé | Contribution | | --- | --- | --- | | 1970 | Opération "pouponnière" lancée par Simone Veil. | Première action ciblée sur les violences institutionnelles envers les enfants, en parallèle des travaux sociologiques d'Erving Goffman sur l'"institution totale". | | Années 2000 | Loi du 2 janvier 2002. | Promotion des droits des usagers pour pallier l'asymétrie de la relation avec l'institution et favoriser l'expression des victimes. | | 2008 | Réforme constitutionnelle. | Création du Défenseur des droits, permettant notamment au Défenseur des enfants de recevoir des réclamations individuelles. | | 2022 | Loi du 7 février 2022. | Première définition légale de la maltraitance dans le secteur social et médico-social. | | 2022 | Loi du 21 mars 2022. | Amélioration de la protection des lanceurs d'alerte, un enjeu connexe à la révélation des dysfonctionnements institutionnels. |

      En matière de protection de l'enfance spécifiquement, la terminologie a évolué, passant des "maltraitances" et "mauvais traitements" (loi de 1989) à la notion de "danger" (loi de 2007), pour finalement réintégrer les termes d'"enfant victime de violence" et d'"enfant maltraité" dans les lois de 2016 et 2022.

      1.3. La Définition de la Maltraitance par la Loi du 7 février 2022

      L'article L119-1 du Code de l'Action Sociale et des Familles (CASF) constitue une avancée majeure. Il définit la maltraitance comme suit :

      "La maltraitance [...] vise toute personne en situation de vulnérabilité lorsqu'un geste, une parole, une action ou un défaut d'action compromet ou porte atteinte à son développement, à ses droits, à ses besoins fondamentaux ou à sa santé [...] et que cette atteinte intervient dans une relation de confiance, de dépendance, de soin ou d'accompagnement.

      Les situations de maltraitance peuvent être ponctuelles ou durables, intentionnelles ou non.

      Leur origine peut être individuelle, collective ou institutionnelle."

      Analyse de cette définition :

      Points positifs : Elle est large, reconnaît la vulnérabilité de la personne et la relation de dépendance.

      Elle dissocie la maltraitance de l'infraction pénale, permettant de qualifier des situations sans qu'un délit soit nécessairement constitué.

      Elle nomme explicitement l'origine "institutionnelle".

      Limites : Le texte ne définit pas ce qu'est la maltraitance institutionnelle en soi.

      Par ailleurs, cette approche se heurte à la logique du droit pénal, qui repose sur le principe de la responsabilité personnelle et ne prévoit pas d'infraction spécifique liée au contexte institutionnel ou à la vulnérabilité des publics accompagnés.

      2. Quantification et Mesure du Phénomène

      2.1. Un Manque de Données Statistiques

      Un obstacle majeur à la compréhension et à la lutte contre les violences institutionnelles est l'absence de quantification claire dans la statistique publique.

      Les enquêtes nationales (ONPE, INED) fournissent peu d'éléments spécifiques sur ce phénomène, ce qui rend son ampleur difficile à évaluer.

      2.2. L'Exposition des Professionnels à la Violence

      Malgré le manque de données globales, les chiffres sur la violence subie par les professionnels sont révélateurs du climat dans le secteur social.

      • Les données de la fonction publique montrent que les professions intermédiaires de la santé et du travail social sont particulièrement victimes de violence dans l'exercice de leurs fonctions.

      • Leur niveau d'exposition à la violence est presque aussi élevé que celui des forces de l'ordre, ce qui souligne l'intensité des tensions et la possible banalisation de la violence dans ce champ.

      • Un très faible pourcentage de ces violences fait l'objet d'une plainte et aboutit à une condamnation pénale, ce qui constitue un enjeu majeur pour la reconnaissance des préjudices subis.

      3. La Question Centrale de la Responsabilité

      3.1. Dépassement de la Responsabilité Individuelle

      La Commission nationale de lutte contre les maltraitances souligne que la maltraitance institutionnelle et la responsabilité individuelle ne sont pas exclusives l'une de l'autre.

      Il est essentiel de distinguer les comportements individuels déviants des dysfonctionnements collectifs ou systémiques qui engagent la société tout entière.

      L'enjeu est de ne pas réduire la maltraitance institutionnelle à une simple somme de fautes professionnelles.

      3.2. Des Chaînes de Responsabilité Plurielles

      La protection de l'enfant, en particulier, met en jeu des chaînes de responsabilité complexes et entremêlées :

      Responsabilité familiale : Souvent déjà mise à mal dans les situations de protection.

      Responsabilité des professionnels : Directement en contact avec les usagers.

      Responsabilité des institutions : Liée à l'organisation, aux moyens, à la culture interne.

      Responsabilité sociétale : Reflétant les seuils de tolérance collectifs et les dispositifs mis en place pour protéger les plus vulnérables.

      De plus, la jurisprudence européenne se montre de plus en plus ferme, ayant déjà condamné la France pour des dysfonctionnements dans son dispositif de protection de l'enfance sur le motif de traitement inhumain et dégradant.

      4. Perspectives et Leviers d'Action : L'Étude de la Ville de Paris

      En partenariat avec l'Université de Lille, l'Observatoire social de la Ville de Paris a lancé en 2023 une étude sur les violences institutionnelles, axée sur le vécu des professionnels des politiques sociales (protection de l'enfance, autonomie, etc.).

      4.1. Principaux Enseignements Préliminaires

      1. Forte Exposition, Fort Engagement : L'étude confirme une forte exposition des professionnels à la violence, mais révèle également un niveau d'engagement au travail particulièrement élevé.

      2. Le Rôle Clé du Soutien Organisationnel : De manière contre-intuitive, le bien-être au travail des professionnels n'est pas directement corrélé au nombre d'actes de violence subis.

      Le facteur le plus déterminant est le soutien organisationnel perçu par les agents.

      Un professionnel qui se sent soutenu par son institution vivra mieux son quotidien, même dans un contexte de violence.

      3. L'Impact de la "Détresse Morale" : Le second facteur déterminant est la "détresse morale".

      Ce concept, issu de travaux canadiens, décrit le sentiment d'impuissance des professionnels qui estiment ne pas avoir les marges de manœuvre ou les moyens nécessaires pour répondre de manière satisfaisante aux besoins des usagers.

      4.2. Pistes de Travail Identifiées

      Ces résultats, bien que préliminaires, ouvrent des pistes d'action concrètes pour prévenir la maltraitance institutionnelle en agissant sur le climat de travail et le bien-être des professionnels.

      Les leviers identifiés sont :

      Renforcer le soutien organisationnel : Mettre en place des dispositifs d'écoute, de reconnaissance et d'appui concrets pour les équipes.

      Améliorer le soutien en équipe : Favoriser la cohésion et l'entraide entre collègues.

      Accroître les marges de manœuvre : Redonner aux professionnels la capacité d'agir de manière adaptée aux situations, réduisant ainsi la détresse morale.

      Travailler sur l'éthique et les valeurs partagées : Consolider une culture professionnelle commune pour guider l'action dans des contextes complexes.

    1. Context length is the maximum number of tokens that the model has access to in memory. The default context length in Ollama is 4096 tokens. Tasks which require large context like web search, agents, and coding tools should be set to at least 64000 tokens.

      Default ollama context length is 4k. Recommended minimum for websearch, agents and coding tools (like Claude Code or Open code) is 64k. I've seen 128k recommendations for Claude Code

    1. Data can be poisoned intentionally as well. For example, in 2021, workers at Kellogg’s were upset at their working conditions, so they agreed to go on strike, and not work until Kellogg’s agreed to improve their work conditions. Kellogg’s announced that they would hire new workers to replace the striking workers: Kellogg’s proposed pay and benefits cuts while forcing workers to work severe overtime as long as 16-hour-days for seven days a week. Some workers stayed on the job for months without a single day off. The company refuses to meet the union’s proposals for better pay, hours, and benefits, so they went on strike. Earlier this week, the company announced it would permanently replace 1,400 striking workers. People Are Spamming Kellogg’s Job Applications in Solidarity with Striking Workers – Vice MotherBoard People in the antiwork subreddit found the website where Kellogg’s posted their job listing to replace the workers. So those Redditors suggested they spam the site with fake applications, poisoning the job application data, so Kellogg’s wouldn’t be able to figure out which applications were legitimate or not (we could consider this a form of trolling). Then Kellogg’s wouldn’t be able to replace the striking workers, and they would have to agree to better working conditions. Then Sean Black, a programmer on TikTok saw this and decided to contribute by creating a bot that would automatically log in and fill out applications with random user info, increasing the rate at which he (and others who used his code) could spam the Kellogg’s job applications:

      This example shows how data poisoning can be used deliberately as a form of collective action, where misleading data is introduced to disrupt a system’s ability to function as intended. It also raises ethical questions about whether manipulating data is justified when it is used to counteract perceived corporate power and support labor rights, rather than for personal gain.

    1. suspiciously precise floats, or,how I got Claude's real limits

      Summary: Claude Usage Limits & Cost Analysis

      Subscription vs. API Efficiency * Massive Cost Savings: Claude subscriptions can be up to 36x cheaper than using the API for equivalent token throughput. * The "Max 5x" Sweet Spot: The $100/mo Max 5x plan is identified as the most optimized tier, offering roughly 8.3x more weekly usage than the Pro plan (exceeding its "5x" marketing). * The "Max 20x" Diminishing Returns: While the $200/mo tier provides 4x higher short-term (5-hour) burst limits than the Max 5x, its weekly ceiling is only ~2x higher, making it less efficient for consistent long-term work.

      Dual-Layer Usage Framework * 5-Hour Rolling Window: Controls "burst" activity. The counter starts at your first prompt; once reached, you must wait for the window to reset. * 7-Day Weekly Ceiling: A hard cap on total "active compute hours" (time spent processing tokens/reasoning). This acts as a global safety valve for the system. * Unified Quota: All usage across the browser (claude.ai), Claude Desktop, and Claude Code (terminal) counts toward the same unified limit.

      Token Consumption Dynamics * Exponential Context Cost: Claude re-reads the entire chat history for every new message. A 50-message thread uses significantly more tokens (and quota) than five separate 10-message chats. * Input-Heavy Bias: Large file attachments, long project instructions, and extensive "Extended Thinking" sessions consume the quota much faster than short, text-only queries.

      Optimization Strategies * The /compact Command: Users are encouraged to use /compact (in Claude Code) or manually summarize/restart chats every 15–20 messages to reset the "token tax" of long histories. * Lean Context: Keeping CLAUDE.md and project documentation concise prevents "context bloat" from draining limits prematurely. * Strategic Timing: Since the 5-hour window starts with the first prompt, power users should time their first interaction to align with their most intensive coding blocks.

      Recommendation: Which Claude Tier is Right for You?

      1. The "Value King": Max 5x ($100/mo) * Why it’s recommended: Data analysis shows this tier is "over-provisioned." While marketed as 5x, it often provides ~6x higher session limits and ~8.3x higher weekly limits than the Pro plan. * The Sweet Spot: It offers the best balance between a massive increase in capacity and price. Most daily professional users and developers find it nearly impossible to hit these limits even with "all-day" use. * Best For: Professional developers, heavy researchers, and those who want consistent access to the Claude 3.5 Opus model without hitting daily caps.

      2. The "Sprint Specialist": Max 20x ($200/mo) * The Caveat: Despite the name, the weekly ceiling is only ~2x higher than the Max 5x plan, not 4x higher. You are essentially paying for a much higher "burst" capacity. * Why choose it: It allows for extremely high-intensity sessions (up to 900+ messages in 5 hours). This is useful if you do massive "sprints" where you need Claude to process huge amounts of data or code in a very short window. * Best For: Solopreneurs building products in rapid bursts, or users who never want to think about rate limits during a 4–6 hour deep-work session.

      3. The "Standard Choice": Pro ($20/mo) * Why it’s recommended: It is 5x cheaper than the next tier and sufficient for 90% of users who use Claude for general writing, light coding, or occasional complex tasks. * Best For: Students, casual coders, and users with small projects (under 1,000 lines of code).

      Summary Table * Quick Comparison Summary * Pro: $20/mo | Baseline Capacity | Best for personal/standard use. * Max 5x: $100/mo | ~8.3x Weekly Capacity | Best overall value for power users. * Max 20x: $200/mo | ~16x Weekly Capacity | Best for high-intensity "burst" work.

    1. 6 pillars of Well Architect Framework

      1. Operation Excellence
        • perform operation as code (iaas)
        • learn from operational failiure
        • anticipate failure
      2. Reliability
        • automatically recover from failure
        • scale horzontally
        • stop guessing capacity
        • manage change with automation
      3. Security.
        • principle of least privilage, IAM
        • protect data in transit and at rest
        • security at all layers
      4. Performance Efficiency
        • go global in minutes
        • experiment frequently,
        • user serverless architecture
      5. Sustainability.
        • adopt efficient tech
        • used managed services
      6. Cost Optimization
        • pay for what you suse
        • use cloudwatch to mesure efficiecy
        • use tags to meaure roi
        • use managed service to reduce cost
    1. Sexual orientation is challenging to code reliably in any context, for rea-sons described in greater detail by scholars such as Adrienne Shaw and Eliza-veta Friesem. Writing about the creation of the LGBTQ Video Game Archive

      And gender, and any/most self-defined identities, which can and do change over time (and probably will more in the future, with transhumanism), even if they are stereotypically pushed top-down, like functional diversity.

    Annotators

    1. Document d'information : La mouvance du "mâle alpha"

      Synthèse

      Ce document d'information synthétise les thèmes, les figures clés et les impacts de la mouvance du "mâle alpha", un phénomène social émanant principalement des réseaux sociaux.

      Porté par des influenceurs comme Andrew Tate à l'international et des figures québécoises telles que Julien Bournival, ce mouvement prône un retour aux valeurs traditionnelles et à des rôles de genre strictement définis, où l'homme est le pourvoyeur et le leader, et la femme, plus soumise, se consacre au foyer.

      L'idéologie fondamentale repose sur une forme de déterminisme biologique, affirmant que les hommes et les femmes possèdent des caractéristiques innées et distinctes qui les destinent à des rôles différents.

      Ce discours trouve un écho particulier auprès de jeunes hommes en quête de repères, attirés par un message mêlant développement personnel (discipline, forme physique, succès entrepreneurial) et une rhétorique de rébellion contre un establishment perçu comme hostile.

      Les experts analysent cette mouvance comme une manifestation contemporaine d'un discours antiféministe récurrent, intrinsèquement misogyne, qui exprime une crainte de la perte des privilèges masculins face à l'avancée de l'égalité des genres.

      Ce phénomène est étroitement lié à une méfiance généralisée envers les institutions (gouvernement, médias, science), à l'adhésion à des théories du complot concernant une "élite" manipulatrice, et à une convergence avec les idéologies de la droite conservatrice, incluant un retour à la religion chrétienne.

      Socialement, cette mouvance contribue à une polarisation idéologique croissante entre les jeunes hommes, qui tendent à devenir plus conservateurs, et les jeunes femmes, de plus en plus progressistes.

      Son influence est désormais palpable jusque dans les salles de classe, où des discours rétrogrades et masculinistes refont surface, témoignant de la nécessité d'une vigilance continue face à la remise en question des acquis en matière d'égalité.

      --------------------------------------------------------------------------------

      1. Définition et idéologie du mouvement "mâle alpha"

      Le mouvement "mâle alpha" est défini comme un phénomène émanant d'influenceurs du web et des réseaux sociaux qui prônent un retour à certaines valeurs traditionnelles. Son idéologie repose sur plusieurs piliers fondamentaux.

      Principes fondamentaux :

      Rôles de genre traditionnels : L'homme assume le rôle de leader et de pourvoyeur ("provider", "chef à la maison"), tandis que la femme est plus soumise et se consacre au foyer et à la famille ("nurture").

      Un influenceur affirme : "La règle c'est que l'homme est un homme puis est masculin puis il doit être le chef à la maison puis c'est lui le provider."

      Force et responsabilité masculine : L'homme "alpha" doit être fort physiquement et mentalement, prendre ses responsabilités, protéger et subvenir aux besoins de sa famille.

      Contrôle dans la relation : Certains discours promeuvent un contrôle sur la partenaire féminine. Un extrait viral stipule : "Quand tu es en couple, tu laisses pas ta blonde sortir d'un club.

      Tu laisses pas ta blonde aller dans un festival. Tu laisses pas ta blonde de mettre des photos de ses fesses en Gstring sur Instagram."

      Justification par le déterminisme biologique :

      Différences innées : Les partisans soutiennent que les hommes et les femmes sont biologiquement différents, ce qui détermine leurs traits de caractère. L'homme serait naturellement "assertif", "direct" et "fonceur", tandis que la femme serait dotée d'une "sensibilité" et d'une "intuition" supérieures.

      Rejet de l'égalité des compétences : L'idée que les hommes et les femmes sont égaux en aptitudes et en compétences est jugée "complètement ridicule" par Julien Bournival.

      Hypergamie : Un coach en dating du mouvement affirme que les femmes sont biologiquement attirées par des hommes qui leur sont supérieurs en termes de confiance, charisme, salaire, grandeur et force, un concept qu'il nomme "hypergamie".

      La "crise de la masculinité" :

      • Les influenceurs du mouvement estiment qu'il existe une "crise de la masculinité" causée par une société qui perçoit la masculinité comme "toxique" et tente d' "émasculiner" les hommes.

      • Cette perception est partagée par de jeunes hommes qui se sentent attaqués ou dévalorisés. Une jeune femme observe : "À force de se faire dire qu'on est méchant, qu'on est pas bon, qu'on est un problème, mais je pense que leur réaction c'est la colère."

      2. Figures clés et leurs discours

      Plusieurs influenceurs sont identifiés comme des figures centrales de ce mouvement, chacun avec un style et une portée distincts.

      Andrew Tate : La figure de proue internationale

      Profil : Influenceur britanno-américain, ancien champion de kickboxing, décrit comme une "méga star" et l'une des personnes les plus recherchées sur Google. Il a été arrêté en Roumanie pour trafic d'êtres humains, viol et formation d'un gang criminel.

      Message double : Son discours est un mélange de développement personnel (discipline, détermination, prise de responsabilité) et de propos jugés "irrespectueux, misagènes [sic] envers la femme".

      Défense de ses partisans : Ses adeptes, comme Julien Bournival, défendent "l'essence de son message" tout en minimisant ses controverses, les qualifiant de "jokes déplacés" ou d'actes d'un "personnage" destiné à provoquer. Un jeune homme affirme : "si m'aide à faire de l'argent. Je vois pas pourquoi je veux dire c'est une mauvaise personne".

      Julien Bournival : Le modèle québécois en Floride

      Profil : Entrepreneur québécois installé en Floride, il se décrit comme faisant partie du "1 % en terme de revenu" et du "1 % en terme de fitness". Il a quitté le Québec, qu'il qualifiait de "République socialiste" durant la pandémie.

      Discours : Il prône un retour aux valeurs traditionnelles, se définit comme un "pourvoyeur" et vit une relation où sa femme s'occupe de la maison et de la famille. Il lie de plus en plus ses valeurs à sa foi chrétienne.

      Activité entrepreneuriale : Il dirige une entreprise (Global) dans le domaine de l'amélioration énergétique, mais utilise ses réunions d'employés comme des séances de "croissance personnelle" où il promeut sa vision du monde, affirmant que les entrepreneurs ont une "responsabilité morale" de bâtir un peuple fort contre les "dirigeants" qui veulent un peuple faible et contrôlable.

      Louis Rassico : L'influenceur repenti

      Parcours : Jeune entraîneur québécois, il a été l'une des premières figures "mâle alpha" au Québec. Il admet avoir été influencé par Andrew Tate et avoir copié son style "intense" ("Ferme ta gueule") pour gagner en popularité, ce qui a fonctionné.

      Prise de distance : Il a depuis changé de discours, qualifiant Tate de "manipulateur" et décrivant son propre parcours comme une "déprogrammation" ou une "déradicalisation". Il a réalisé qu'il "perdai[t] contact avec la vraie réalité des choses".

      Chloé Roma : La défenseure des droits des hommes

      Position : Canadienne connaissant un grand succès en défendant les droits des hommes. Elle soutient que les hommes sont en crise, manquent de modèles positifs et sont toujours soumis à l'attente d'être "protecteur et pourvoyeur", contrairement aux femmes qui sont maintenant perçues comme capables de multiples rôles.

      Analyse sur Tate : Elle pense que le succès de Tate s'explique par le fait qu'il a touché une audience d'hommes sans figure paternelle ou modèle masculin positif, mais critique le fait que son message renforce les attentes négatives déjà pesantes sur les hommes.

      3. Analyse critique et impacts sociétaux

      Des experts et des acteurs de la société civile offrent une analyse critique de ce mouvement et de ses conséquences.

      Perspective sociologique (Francis Dupuis-Déri) :

      Discours récurrent : La "crise de la masculinité" n'est pas un phénomène nouveau. Des discours similaires existent depuis l'Antiquité romaine et à chaque siècle depuis, quel que soit le contexte politique ou culturel.

      Nature misogyne : Le discours de la crise est "nécessairement misogyne" car il postule que (1) les hommes vont mal, (2) c'est à cause des femmes, et (3) la solution est un retour à une masculinité traditionnelle.

      Réponse à l'égalité : Ce mouvement est une forme d'antiféminisme porté par des hommes qui "ne veulent pas de l'égalité" et voient le progrès des droits des femmes comme une "menace" à leurs privilèges.

      Réfutation du déterminisme : L'idée de rôles biologiquement définis est contredite par l'histoire de l'humanité, qui montre une grande diversité de rôles assumés par les hommes et les femmes. La différence des rôles est avant tout liée à la "socialisation et des éducations différentes".

      Impact en milieu scolaire (Véronique Guitras, enseignante) :

      Retour de discours rétrogrades : L'enseignante a constaté un "clash de discours" dans sa classe après un congé de maternité. Des élèves masculins tiennent désormais des propos "conservateurs, traditionnels, masculinistes".

      Exemples concrets : Un élève lui a affirmé que l'aspiration de toutes les femmes est d'être "invité sur un yat à Dubaï", et qu'elles ne sont pas des "bâtisseuses" comme les hommes. Elle décrit ce phénomène comme un retour "60 ans en arrière".

      Polarisation idéologique croissante :

      Fossé de genre : Un fossé idéologique se creuse chez les jeunes en Occident : les jeunes femmes deviennent de plus en plus progressistes et féministes, tandis que les jeunes hommes deviennent de plus en plus conservateurs.

      Débat "l'homme ou l'ours" : Ce débat viral illustre la méfiance des femmes envers les hommes.

      Une jeune femme explique préférer rencontrer un ours dans la forêt, car "l'ours quand il va m'attaquer, on va pas me demander comment j'étais habillée avant".

      Une autre affirme qu'il est "nécessaire pour nous de se méfier de tous les hommes" pour leur propre sécurité.

      4. Liens avec le conservatisme et les théories du complot

      Le discours "mâle alpha" est intrinsèquement lié à une méfiance envers les institutions et à une adhésion à des idéologies conservatrices et conspirationnistes.

      Méfiance envers les institutions :

      Rejet de l'autorité : Il existe une perte de confiance généralisée envers la science, la médecine, le gouvernement et surtout les médias, qualifiés d' "agence de publicité du gouvernement".

      Ce phénomène a été "considérablement accéléré" par la pandémie.

      Sentiment d'abandon : Selon l'anthropologue Samuel Viger, ce rejet peut provenir d'un sentiment d'abandon par le système (crises du logement, de la santé, inégalités croissantes), poussant certains individus vers des discours marginaux.

      Rhétorique conspirationniste :

      L'élite manipulatrice : Les influenceurs de la mouvance véhiculent l'idée qu'une "élite" satanique contrôle le monde et cherche à affaiblir la population en s'attaquant à la famille traditionnelle, en "brainwashant" les enfants et en promouvant une société de "weak person".

      La posture de rébellion : Adopter les valeurs "mâle alpha" est présenté comme "l'ultime rébellion" contre ce système de contrôle.

      Convergence avec la droite et la religion :

      Idéologie de droite : Le mouvement s'aligne sur des valeurs conservatrices. Julien Bournival admire Donald Trump et s'est installé en Floride pour le mode de vie républicain promu par Ron DeSantis.

      Retour à la foi chrétienne : Plusieurs figures du mouvement, dont Bournival, se tournent vers la Bible pour justifier les valeurs traditionnelles.

      Le passage biblique sur la soumission de la femme à l'homme (Éphésiens 5:22-33) est cité comme un "code d'éthique". La foi est présentée comme une garantie morale pour la soumission de la femme.

      Hostilité envers les minorités de genre :

      Vision rigide des genres : L'existence de personnes transgenres et de drag queens est perçue comme une attaque directe à leur conception "biologisante" et naturelle de l'homme et de la femme.

      Accusations de "grooming" : Les drag queens qui lisent des contes aux enfants sont accusées de "grooming" et de faire partie d'un "agenda satanique".

      Cette rhétorique escalade jusqu'à des comparaisons avec la pédophilie : "c'est quoi la prochaine affaire [...] c'est on va accepter les pédophiles".

      5. Citations marquantes

      | Thème | Citation | Locuteur | | --- | --- | --- | | Idéologie Mâle Alpha | "Chris, allez au gym, arrêtez de faire vos couches de guilleir." | Extrait audio d'influenceur | | Rôles Traditionnels | "La règle c'est que l'homme est un homme puis est masculin puis il doit être le chef à la maison puis c'est lui le provider." | Julien Bournival | | Soumission féminine | "Moi j'aime mieux être dans le shadow, m'occuper de notre maison \[...\] Va à la guerre, va au front, moi je reste derrière." | Partenaire de Julien Bournival | | Critique d'Andrew Tate | "On s'entend que Andw Tate a des propos irrespectueux, misagène envers la femme en général." | Journaliste | | Défense d'Andrew Tate | "L'essence de son message \[...\] c'est respecte-toi, respecte les autres, prends soin de toi. Assure-toi que quand tu dis de quoi, ta parole vaut de quoi." | Julien Bournival | | Impact scolaire | "Je me retrouve devant des jeunes qui ont des discours conservateurs, traditionnels, masculinistes \[...\] A on est revenu 60 ans en arrière." | Véronique Guitras, enseignante | | Analyse sociologique | "Le discours de crise \[...\] il dit les hommes vont mal. Ils vont mal à cause de qui ? Ils vont mal à cause des femmes." | Francis Dupuis-Déri, sociologue | | Polarisation | "L'ours quand il va m'attaquer, on va pas me demander comment j'étais habillée avant." | Jeune femme | | Théorie du complot | "Ceux qui contrôlent le monde sont satanique. Ils contrôlent les gouvernements." | Julien Bournival | | Repentir | "\[Andrew Tate\] est manipulateur clairement. \[...\] Moi-même je me suis fait influencer par lui \[...\] Je me suis déprogrammé." | Louis Rassico, entraîneur | | Vigilance | "J'ai deux filles. J'ai pas envie qu'elle vivent dans un monde inégalitaire. J'ai pas envie qu'elle soit soumise à quiconque. \[...\] il y a rien de gagner pour toujours." | Journaliste |

    1. So, let's look at how that transaction takes place. Let's suppose you're visiting a web page in your browser, and there's a link on it. www.host.edu/page.html, now your browser is viewing a document that is coded in the Hypertext Markup Language, which is the language used to code web pages. So, when you click on that link, your browser knows to send a request over the Internet for that page that you've requested to a web server at host.edu at this location www.host.edu that URI corresponds to an IP address. The IP protocol knows how to route that request to a particular server at host.edu, and that server is constantly listening on Port 80 (A port is a memory location in the server's RAM that is connected to software that listens for incoming requests.) for incoming HTTP requests. When it gets the request it needs to access the page that you requested page.html, so it goes to its disk drive, retrieves the page.html with all of its included resources. In this case, there's a picture smiley.png, and it then sends it back encoded as an HTML document, of course, to the browser that requested it. The browser then renders the page, and it appears the way you'd expect it to. So those are the seven parts of the transaction seen from a high level. We'll look in some detail later on. Another example is bring up an app in App Inventor like paint pot.

      Include HTTP ans HTML model picture.

  2. Jan 2026
    1. CloudFormation:

      • Infastrucuture as Code,
        • control vm, os, and appplication
        • defines and manages AWS infastructure
        • provides templates
        • works with ECS, s3 , efs, rds,
        • manages entire stack of resoources
        • full control over

      Elastic BeanStalk

        • Platform as service
        • simplified application deployment and scaling
        • limited control over
        • use cases: complex archi

      CodeDeploy

        • deploy code

      CodeCommit

        • comparable to github
        • store code and provides verision controls
        • allows collaboration

      CodeBuild

        • able to build apps in the cloud
        • complies and run the application

      CodePipeLine

        • commit -> deplot -> build in one step

      artifact, start, cloud9 and sssm

    1. Synthèse Analytique : Les Mécanismes du Contrôle Coercitif et de la Violence Intrafamiliale

      Résumé Exécutif

      Ce document analyse les dynamiques du contrôle coercitif à travers le prisme des audiences judiciaires et des témoignages d'experts présentés dans l'enquête d'ARTE.

      Le contrôle coercitif ne se limite pas à des actes isolés de violence physique, mais constitue un système délibéré de domination visant à aliéner la liberté de la victime. Les points clés identifiés incluent :

      La nature systémique du contrôle : Il s'agit d'une stratégie globale incluant la micro-surveillance, l'isolement social et la dévalorisation psychologique.

      L'arsenal tactique : L'utilisation de technologies (GPS, caméras, accès aux réseaux sociaux) et de pressions économiques pour maintenir une emprise totale.

      La rhétorique de l'agresseur : Une tendance systématique à l'inversion de la culpabilité, à la minimisation des faits et à l'utilisation de prétextes émotionnels pour justifier la violence.

      L'évolution juridique : La nécessité d'intégrer la notion de contrôle coercitif dans le droit pour déconstruire les rapports de domination ancrés historiquement dans le Code civil.

      --------------------------------------------------------------------------------

      1. Définition et Stratégies du Contrôle Coercitif

      Le contrôle coercitif est décrit comme une « arme par excellence » pour soumettre l'autre.

      Contrairement à la violence ponctuelle, il s'inscrit dans la durée et l'omniprésence.

      Mécanismes de surveillance et de micro-contrôle

      L'agresseur cherche à envahir l'espace psychique, intime et professionnel de la victime par divers moyens :

      Surveillance technologique : Installation de traceurs GPS sous les véhicules, utilisation de caméras de surveillance au domicile, et exigence des codes d'accès aux téléphones et réseaux sociaux.

      Intrusion nocturne : Privation de sommeil par la musique forte ou réveils forcés durant la nuit pour obtenir des « aveux » d'infidélité imaginaire.

      Contrôle du corps : Surveillance de la tenue vestimentaire et, dans des cas extrêmes, inspection des sous-vêtements pour déceler des preuves de rapports extra-conjugaux ou de prostitution.

      Isolement et dévalorisation

      Le contrôle passe par la création d'un désert social autour de la victime :

      Rupture des liens : Interdiction ou limitation des visites à la famille (notamment la mère) et aux amis, sauf en présence de l'agresseur.

      Atteinte à la dignité : Utilisation d'un langage dégradant (« pute », « salope », « moins que rien ») et dénigrement constant des capacités professionnelles ou maternelles.

      Pathologisation de la victime : Traiter la victime d'« hystérique » ou de « folle » pour discréditer sa parole et justifier le contrôle.

      --------------------------------------------------------------------------------

      2. Le Cycle de la Domination et de la Contrainte

      Le passage de la violence verbale à la violence physique et à la séquestration suit une progression souvent prévisible.

      La contrainte physique et matérielle

      Séquestration : Fermer la maison à clé pour empêcher la victime de sortir, ou s'enfermer avec elle pour l'empêcher de fuir une dispute.

      Contrôle des besoins vitaux : Interdiction d'accès à la cuisine pour les enfants ou la conjointe, contrôle strict des courses alimentaires (ne rapporter que de l'eau, forcer la victime à consommer des produits périmés).

      Emprise économique : Captation des prestations sociales et reproches systématiques sur la gestion financière, visant à créer une dépendance totale.

      Menaces et terrorisme domestique

      Le climat de peur est maintenu par des menaces de mort explicites et récurrentes :

      Menaces d'homicide : SMS répétés (« je vais te tuer », « je vais te crever »), mise en joue avec une arme à feu chargée, ou menaces de précipiter la victime d'un pont ou contre un mur sur l'autoroute.

      Chantage au suicide : Utilisation de la menace de se donner la mort pour manipuler la victime et l'empêcher de rompre.

      Violence physique directe : Crachats au visage, strangulation jusqu'à la perte de connaissance, coups de tête et pressions physiques pour « faire taire » la victime.

      --------------------------------------------------------------------------------

      3. Analyse de la Défense des Agresseurs

      L'analyse des audiences révèle des schémas de défense récurrents chez les auteurs de violences, visant à éluder leur responsabilité.

      | Tactique de défense | Manifestation constatée dans les sources | | --- | --- | | Inversion de la culpabilité | Affirmer que la victime est « capricieuse », « exigeante » ou qu'elle a « provoqué » l'acte par son comportement ou son infidélité supposée. | | Minimisation | Qualifier un crachat de « simple réaction », ou des menaces de mort de « mots dits sous le coup de la colère ». | | Justification par le trauma personnel | Invoquer le décès d'un proche, une éducation violente ou une surcharge de travail pour excuser le passage à l'acte. | | Déni de la réalité | Contester les faits malgré les preuves matérielles (SMS, rapports de police, expertises médicales). | | Présentation de soi comme victime | Se décrire comme le « véritable lésé » de l'histoire, celui qui a donné tout son amour sans retour. |

      --------------------------------------------------------------------------------

      4. Impact Traumatique et Conséquences Sociales

      La violence ne s'arrête pas à la victime directe ; elle irradie sur l'ensemble du cercle familial.

      Impact sur les enfants : Les enfants sont témoins et parfois cibles des violences.

      Ils vivent dans une atmosphère de terreur (« se cacher dans la chambre », « avoir peur de son père »).

      L'agresseur peut même les rendre responsables de son propre état émotionnel.

      Le cycle intergénérationnel : Les experts et magistrats soulignent le risque que les enfants reproduisent ces schémas de violence une fois adultes s'ils ne sont pas interrompus.

      La période de séparation : Identifiée comme la phase la plus dangereuse.

      C'est souvent au moment où la victime tente de reprendre sa liberté que la violence culmine (harcèlement par SMS, rodéos autour du domicile, usage de traceurs).

      --------------------------------------------------------------------------------

      5. Perspectives Institutionnelles et Juridiques

      Le document souligne le décalage entre la perception de l'agresseur et la norme légale.

      L'héritage historique : Le rappel de l'ancien article 213 du Code civil (1803), qui imposait l'obéissance de la femme à son mari, explique la persistance de structures de domination archaïques dans l'esprit de certains agresseurs.

      Le rôle de la justice : La magistrature, aujourd'hui majoritairement féminine, a pour mission de rappeler la loi et de déconstruire ces rapports de force.

      Le droit doit s'immiscer « au cœur des rapports intimes » pour protéger la liberté individuelle.

      Sanctions et obligations : Les condamnations citées incluent des peines de prison (dont certaines sous surveillance électronique), des amendes pour préjudice moral, des interdictions de paraître au domicile et l'obligation de suivre des stages de responsabilisation contre les violences sexistes.

      --------------------------------------------------------------------------------

      Note finale : Le contrôle coercitif se définit par la volonté de « conserver sous un contrôle permanent, total et absolu » une personne, la réduisant à un objet de propriété plutôt qu'à un sujet de droit.

      Sa reconnaissance judiciaire est l'outil essentiel pour briser ce système d'oppression.

    1. illegal immigration

      illegal immigration. i don't know how else you would call it. But also calling people who migrated without the prefered process by the United States, illegal aliens. People online agree the term "Alien" is right because it desribes those group of people but even if it is the legal code/name. Does not mean it's morally right. Alien, that term has a name in the main populous that is a in a negative connotation. It also dehumanizes them. Humans are not aliens, they are human beings on the plant earth. Why do that?

    1. Social workers should consider ethical theory and principles generally, social work theory and research, laws, regulations, agency policies, and other relevant codes of ethics, recognizing that among codes of ethics social workers should consider the NASW Code of Ethics as their primary source.

      I highlighted this section to relate it to some of the times at work I have seen social workers fail their clients and not live up to the ethical standards placed by the NASW. While the NASW Code of Ethics is an important guide, putting it into practice isn’t as straightforward as it sometimes sounds. Our decisions are shaped not only by ethical theory and agency policies, but also by the real-life systems our clients are navigating every day. Many of the communities social workers work with, immigrants, people of color, those living in poverty and dealing with homelessness, are dealing with systems that were never designed with them in mind or rather it was staged as it was . Laws, policies, and even helping institutions can unintentionally reinforce the same inequalities they claim to address. As a Latina woman, I can’t ignore how racism, classism, and gender inequality show up in these systems, because I see how they directly impact the people sitting across from me. The code of ethics reminds social workers constantly to keep questioning the role social workers play within these power structures. Are we truly advocating for our clients, or are we sometimes acting as gatekeepers for systems that continue to marginalize them? Regardless there is much that needs to change.

    2. Rather, a code of ethics sets forth values, ethical principles, and ethical standards to which professionals aspire and by which their actions can be judged.

      I chose to annotate this due to the first idea that a code of ethics lays out values and principles that guide our actions really shows up in day-to-day work. The NASW Code of Ethics isn’t something I think about only in theory it’s something that helps me decide how to respond to people in real situations, especially in places like a detox unit where emotions are high and people are vulnerable. In substance use treatment, a lot of clients already feel judged or ashamed before they even walk through the door. As a Latina, I see how cultural expectations, family pressure, and stigma around addiction can affect how someone acts in treatment. When clients seem withdrawn or unmotivated, it’s often not because they don’t care, but because they’re scared, embarrassed, or trying to protect their family image. For example, on a detox unit that offers both inpatient and outpatient services, I worked with a Latina client detoxing from alcohol. She barely spoke in groups and kept asking to be discharged early. Some staff saw her as resistant or noncompliant. When I talked with her individually, she opened up about feeling like she had let her family down. She was also worried about outpatient appointments interfering with work and potentially losing her job. Looking at the situation through the lens of the NASW Code of Ethics helped me respond differently, especially not on a judgemental level.

    1. his will increase the debt of the State that amount.... It seems tous a propitious time to revise our Penal Code, and abolish the penitentiarysystem—adopting in lieu thereof the principles embodied in the Codes ofSouth and North Carolina [corporal and capital punishment].

      And they refuse to ask the north to help because they must keep racial hierarchy

    Annotators

    1. The discs and cartridges of digital games, which can also be analogousto collections of physical pieces, can be sold, lent, borrowed, or stolen inmuch the same way. Even when these activities violate terms and condi-tions, for games that are not digitally distributed or networked, such termsand conditions were/are hard to enforce. However, game software and digitalplatforms and their collections of assets and code do not belong to players,and lending or imposing our own terms of use on them is explicitly prohib-ited and technically challenging to implement.

      Not necessarily... you know, there is a vibrant community behind videogame cracking (and virtualisation). And DRM-free titles (from GOG, or itch) provide a cheesy way of sharing games more easily than sharing "disks".

    Annotators

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Wu and colleagues aimed to explain previous findings that adolescents, compared to adults, show reduced cooperation following cooperative behaviour from a partner in several social scenarios. The authors analysed behavioural data from adolescents and adults performing a zero-sum Prisoner's Dilemma task and compared a range of social and non-social reinforcement learning models to identify potential algorithmic differences. Their findings suggest that adolescents' lower cooperation is best explained by a reduced learning rate for cooperative outcomes, rather than differences in prior expectations about the cooperativeness of a partner. The authors situate their results within the broader literature, proposing that adolescents' behaviour reflects a stronger preference for self-interest rather than a deficit in mentalising.

      Strengths:

      The work as a whole suggests that, in line with past work, adolescents prioritise value accumulation, and this can be, in part, explained by algorithmic differences in weighted value learning. The authors situate their work very clearly in past literature, and make it obvious the gap they are testing and trying to explain. The work also includes social contexts that move the field beyond non-social value accumulation in adolescents. The authors compare a series of formal approaches that might explain the results and establish generative and modelcomparison procedures to demonstrate the validity of their winning model and individual parameters. The writing was clear, and the presentation of the results was logical and wellstructured.

      We thank the reviewer for recognizing the strengths of our work.

      Weaknesses:

      (Q1) I also have some concerns about the methods used to fit and approximate parameters of interest. Namely, the use of maximum likelihood versus hierarchical methods to fit models on an individual level, which may reduce some of the outliers noted in the supplement, and also may improve model identifiability.

      We thank the reviewer for this suggestion. Following the comment, we added a hierarchical Bayesian estimation. We built a hierarchical model with both group-level (adolescent group and adult group) and individual-level structures for the best-fitting model. Four Markov chains with 4,000 samples each were run, and the model converged well (see Figure supplement 7)

      We then analyzed the posterior parameters for adolescents and adults separately. The results were consistent with those from the MLE analysis (see Figure 2—figure supplement 5). These additional results have been included in the Appendix Analysis section (also see Figure supplement 5 and 7). In addition, we have updated the code and provided the link for reference. We appreciate the reviewer’s suggestion, which improved our analysis.

      (Q2) There was also little discussion given the structure of the Prisoner's Dilemma, and the strategy of the game (that defection is always dominant), meaning that the preferences of the adolescents cannot necessarily be distinguished from the incentives of the game, i.e. they may seem less cooperative simply because they want to play the dominant strategy, rather than a lower preferences for cooperation if all else was the same.

      We thank the reviewer for this comment and agree that adolescents’ lower cooperation may partly reflect a rational response to the incentive structure of the Prisoner’s Dilemma.

      However, our computational modeling explicitly addressed this possibility. Model 4 (inequality aversion) captures decisions that are driven purely by self-interest or aversion to unequal outcomes, including a parameter reflecting disutility from advantageous inequality, which represents self-oriented motives. If participants’ behavior were solely guided by the payoff-dominant strategy, this model should have provided the best fit. However, our model comparison showed that Model 5 (social reward) performed better in both adolescents and adults, suggesting that cooperative behavior is better explained by valuing social outcomes beyond payoff structures.

      Besides, if adolescents’ lower cooperation is that they strategically respond to the payoff structure by adopting defection as the more rewarding option. Then, adolescents should show reduced cooperation across all rounds. Instead, adolescents and adults behaved similarly when partners defected, but adolescents cooperated less when partners cooperated and showed little increase in cooperation even after consecutive cooperative responses. This pattern suggests that adolescents’ lower cooperation cannot be explained solely by strategic responses to payoff structures but rather reflects a reduced sensitivity to others’ cooperative behavior or weaker social reciprocity motives. We have expanded our Discussion to acknowledge this important point and to clarify how the behavioral and modeling results address the reviewer’s concern.

      “Overall, these findings indicate that adolescents’ lower cooperation is unlikely to be driven solely by strategic considerations, but may instead reflect differences in the valuation of others’ cooperation or reduced motivation to reciprocate. Although defection is the payoffdominant strategy in the Prisoner’s Dilemma, the selective pattern of adolescents’ cooperation and the model comparison results indicate that their reduced cooperation cannot be fully explained by strategic incentives, but rather reflects weaker valuation of social reciprocity.”

      Appraisal & Discussion:

      (Q3) The authors have partially achieved their aims, but I believe the manuscript would benefit from additional methodological clarification, specifically regarding the use of hierarchical model fitting and the inclusion of Bayes Factors, to more robustly support their conclusions. It would also be important to investigate the source of the model confusion observed in two of their models.

      We thank the reviewer for this comment. In the revised manuscript, we have clarified the hierarchical Bayesian modeling procedure for the best-fitting model, including the group- and individual-level structure and convergence diagnostics. The hierarchical approach produced results that fully replicated those obtained from the original maximumlikelihood estimation, confirming the robustness of our findings. Please also see the response to Q1.

      Regarding the model confusion between the inequality aversion (Model 4) and social reward (Model 5) models in the model recovery analysis, both models’ simulated behaviors were best captured by the baseline model. This pattern arises because neither model includes learning or updating processes. Given that our task involves dynamic, multi-round interactions, models lacking a learning mechanism cannot adequately capture participants’ trial-by-trial adjustments, resulting in similar behavioral patterns that are better explained by the baseline model during model recovery. We have added a clarification of this point to the Results:

      “The overlap between Models 4 and 5 likely arises because neither model incorporates a learning mechanism, making them less able to account for trial-by-trial adjustments in this dynamic task.”

      (Q4) I am unconvinced by the claim that failures in mentalising have been empirically ruled out, even though I am theoretically inclined to believe that adolescents can mentalise using the same procedures as adults. While reinforcement learning models are useful for identifying biases in learning weights, they do not directly capture formal representations of others' mental states. Greater clarity on this point is needed in the discussion, or a toning down of this language.

      We sincerely thank the reviewer for this professional comment. We agree that our prior wording regarding adolescents’ capacity to mentalise was somewhat overgeneralized. Accordingly, we have toned down the language in both the Abstract and the Discussion to better align our statements with what the present study directly tests. Specifically, our revisions focus on adolescents’ and adults’ ability to predict others’ cooperation in social learning. This is consistent with the evidence from our analyses examining adolescents’ and adults’ model-based expectations and self-reported scores on partner cooperativeness (see Figure 4). In the revised Discussion, we state:

      “Our results suggest that the lower levels of cooperation observed in adolescents stem from a stronger motive to prioritize self-interest rather than a deficiency in predicting others’ cooperation in social learning”.

      (Q5) Additionally, a more detailed discussion of the incentives embedded in the Prisoner's Dilemma task would be valuable. In particular, the authors' interpretation of reduced adolescent cooperativeness might be reconsidered in light of the zero-sum nature of the game, which differs from broader conceptualisations of cooperation in contexts where defection is not structurally incentivised.

      We thank the reviewer for this comment and agree that adolescents’ lower cooperation may partly reflect a rational response to the incentive structure of the Prisoner’s Dilemma. However, our behavioral and computational evidence suggests that this pattern cannot be explained solely by strategic responses to payoff structures, but rather reflects a reduced sensitivity to others’ cooperative behavior or weaker social reciprocity motives. We have expanded the Discussion to acknowledge this point and to clarify how both behavioral and modeling results address the reviewer’s concern (see also our response to Q2).

      (Q6) Overall, I believe this work has the potential to make a meaningful contribution to the field. Its impact would be strengthened by more rigorous modelling checks and fitting procedures, as well as by framing the findings in terms of the specific game-theoretic context, rather than general cooperation.

      We thank the reviewer for the professional comments, which have helped us improve our work.

      Reviewer #2 (Public review):

      Summary:

      This manuscript investigates age-related differences in cooperative behavior by comparing adolescents and adults in a repeated Prisoner's Dilemma Game (rPDG). The authors find that adolescents exhibit lower levels of cooperation than adults. Specifically, adolescents reciprocate partners' cooperation to a lesser degree than adults do. Through computational modeling, they show that this relatively low cooperation rate is not due to impaired expectations or mentalizing deficits, but rather a diminished intrinsic reward for reciprocity. A social reinforcement learning model with asymmetric learning rate best captured these dynamics, revealing age-related differences in how positive and negative outcomes drive behavioral updates. These findings contribute to understanding the developmental trajectory of cooperation and highlight adolescence as a period marked by heightened sensitivity to immediate rewards at the expense of long-term prosocial gains.

      Strengths:

      (1) Rigid model comparison and parameter recovery procedure.

      (2) Conceptually comprehensive model space.

      (3) Well-powered samples.

      We thank the reviewer for highlighting the strengths of our work.

      Weaknesses:

      (Q1) A key conceptual distinction between learning from non-human agents (e.g., bandit machines) and human partners is that the latter are typically assumed to possess stable behavioral dispositions or moral traits. When a non-human source abruptly shifts behavior (e.g., from 80% to 20% reward), learners may simply update their expectations. In contrast, a sudden behavioral shift by a previously cooperative human partner can prompt higher-order inferences about the partner's trustworthiness or the integrity of the experimental setup (e.g., whether the partner is truly interactive or human). The authors may consider whether their modeling framework captures such higher-order social inferences. Specifically, trait-based models-such as those explored in Hackel et al. (2015, Nature Neuroscience)-suggest that learners form enduring beliefs about others' moral dispositions, which then modulate trial-bytrial learning. A learner who believes their partner is inherently cooperative may update less in response to a surprising defection, effectively showing a trait-based dampening of learning rate.

      We thank the reviewer for this thoughtful comment. We agree that social learning from human partners may involve higher-order inferences beyond simple reinforcement learning from non-human sources. To address this, we had previously included such mechanisms in our behavioral modeling. In Model 7 (Social Reward Model with Influence), we tested a higher-order belief-updating process in which participants’ expectations about their partner’s cooperation were shaped not only by the partner’s previous choices but also by the inferred influence of their own past actions on the partner’s subsequent behavior. In other words, participants could adjust their belief about the partner’s cooperation by considering how their partner’s belief about them might change. Model comparison showed that Model 7 did not outperform the best-fitting model, suggesting that incorporating higher-order influence updates added limited explanatory value in this context. As suggested by the reviewer, we have further clarified this point in the revised manuscript.

      Regarding trait-based frameworks, we appreciate the reviewer’s reference to Hackel et al. (2015). That study elegantly demonstrated that learners form relatively stable beliefs about others’ social dispositions, such as generosity, especially when the task structure provides explicit cues for trait inference (e.g., resource allocations and giving proportions). By contrast, our study was not designed to isolate trait learning, but rather to capture how participants update their expectations about a partner’s cooperation over repeated interactions. In this sense, cooperativeness in our framework can be viewed as a trait-like latent belief that evolves as evidence accumulates. Thus, while our model does not include a dedicated trait module that directly modulates learning rates, the belief-updating component of our best-fitting model effectively tracks a dynamic, partner-specific cooperativeness, potentially reflecting a prosocial tendency.

      (Q2) This asymmetry in belief updating has been observed in prior work (e.g., Siegel et al., 2018, Nature Human Behaviour) and could be captured using a dynamic or belief-weighted learning rate. Models incorporating such mechanisms (e.g., dynamic learning rate models as in Jian Li et al., 2011, Nature Neuroscience) could better account for flexible adjustments in response to surprising behavior, particularly in the social domain.

      We thank the reviewer for the suggestion. Following the comment, we implemented an additional model incorporating a dynamic learning rate based on the magnitude of prediction errors. Specifically, we developed Model 9:  Social reward model with Pearce–Hall learning algorithm (dynamic learning rate), in which participants’ beliefs about their partner’s cooperation probability are updated using a Rescorla–Wagner rule with a learning rate dynamically modulated by the Pearce–Hall (PH) Error Learning mechanism. In this framework, the learning rate increases following surprising outcomes (larger prediction errors) and decreases as expectations become more stable (see Appendix Analysis section for details).

      The results showed that this dynamic learning rate model did not outperform our bestfitting model in either adolescents or adults (see Figure supplement 6). We greatly appreciate the reviewer’s suggestion, which has strengthened the scope of our analysis. We now have added these analyses to the Appendix Analysis section (also Figure Supplement 6) and expanded the Discussion to acknowledge this modeling extension and further discuss its implications.

      (Q3) Second, the developmental interpretation of the observed effects would be strengthened by considering possible non-linear relationships between age and model parameters. For instance, certain cognitive or affective traits relevant to social learning-such as sensitivity to reciprocity or reward updating-may follow non-monotonic trajectories, peaking in late adolescence or early adulthood. Fitting age as a continuous variable, possibly with quadratic or spline terms, may yield more nuanced developmental insights.

      We thank the reviewer for this professional comment. In addition to the linear analyses, we further conducted exploratory analyses to examine potential non-linear relationships between age and the model parameters. Specifically, we fit LMMs for each of the four parameters as outcomes (α+, α-, β, and ω). The fixed effects included age, a quadratic age term, and gender, and the random effects included subject-specific random intercepts and random slopes for age and gender. Model comparison using BIC did not indicate improvement for the quadratic models over the linear models for α<sup>+</sup> (ΔBIC<sub>quadratic-linear</sub> = 5.09), α<sup>-</sup>(ΔBIC<sub>quadratic-linear</sub> = 3.04), β (ΔBIC<sub>quadratic-linear</sub> = 3.9), or ω (ΔBIC<sub>quadratic-linear</sub>= 0). Moreover, the quadratic age term was not significant for α<sup>+</sup>, α<sup>−</sup>, or β (all ps > 0.10). For ω, we observed a significant linear age effect (b = 1.41, t = 2.65, p = 0.009) and a significant quadratic age effect (b = −0.03, t = −2.39, p = 0.018; see Author response image 1). This pattern is broadly consistent with the group effect reported in the main text. The shaded area in the figure represents the 95% confidence interval. As shown, the interval widens at older ages (≥ 26 years) due to fewer participants in that range, which limits the robustness of the inferred quadratic effect. In consideration of the limited precision at older ages and the lack of BIC improvement, we did not emphasize the quadratic effect in the revised manuscript and present these results here as exploratory.

      Author response image 1.

      Linear and quadratic model fits showing the relationship between age and the ω parameter, with 95% confidence intervals.

      (Q4) Finally, the two age groups compared - adolescents (high school students) and adults (university students) - differ not only in age but also in sociocultural and economic backgrounds. High school students are likely more homogenous in regional background (e.g., Beijing locals), while university students may be drawn from a broader geographic and socioeconomic pool. Additionally, differences in financial independence, family structure (e.g., single-child status), and social network complexity may systematically affect cooperative behavior and valuation of rewards. Although these factors are difficult to control fully, the authors should more explicitly address the extent to which their findings reflect biological development versus social and contextual influences.

      We appreciate this comment. Indeed, adolescents (high school students) and adults (university students) differ not only in age but also in sociocultural and socioeconomic backgrounds. In our study, all participants were recruited from Beijing and surrounding regions, which helps minimize large regional and cultural variability. Moreover, we accounted for individual-level random effects and included participants’ social value orientation (SVO) as an individual difference measure.

      Nonetheless, we acknowledge that other contextual factors, such as differences in financial independence, socioeconomic status, and social experience—may also contribute to group differences in cooperative behavior and reward valuation. Although our results are broadly consistent with developmental theories of reward sensitivity and social decisionmaking, sociocultural influences cannot be entirely ruled out. Future work with more demographically matched samples or with socioeconomic and regional variables explicitly controlled will help clarify the relative contributions of biological and contextual factors. Accordingly, we have revised the Discussion to include the following statement:

      “Third, although both age groups were recruited from Beijing and nearby regions, minimizing major regional and cultural variation, adolescents and adults may still differ in socioeconomic status, financial independence, and social experience. Such contextual differences could interact with developmental processes in shaping cooperative behavior and reward valuation. Future research with demographically matched samples or explicit measures of socioeconomic background will help disentangle biological from sociocultural influences.”

      Reviewer #3 (Public review):

      Summary:

      Wu and colleagues find that in a repeated Prisoner's Dilemma, adolescents, compared to adults, are less likely to increase their cooperation behavior in response to repeated cooperation from a simulated partner. In contrast, after repeated defection by the partner, both age groups show comparable behavior.

      To uncover the mechanisms underlying these patterns, the authors compare eight different models. They report that a social reward learning model, which includes separate learning rates for positive and negative prediction errors, best fits the behavior of both groups. Key parameters in this winning model vary with age: notably, the intrinsic value of cooperating is lower in adolescents. Adults and adolescents also differ in learning rates for positive and negative prediction errors, as well as in the inverse temperature parameter.

      Strengths:

      The modeling results are compelling in their ability to distinguish between learned expectations and the intrinsic value of cooperation. The authors skillfully compare relevant models to demonstrate which mechanisms drive cooperation behavior in the two age groups.

      We thank the reviewer’s recognition of our work’s strengths.

      Weaknesses:

      (Q1) Some of the claims made are not fully supported by the data:

      The central parameter reflecting preference for cooperation is positive in both groups. Thus, framing the results as self-interest versus other-interest may be misleading.

      We thank the reviewer for this insightful comment. In the social reward model, the cooperation preference parameter is positive by definition, as defection in the repeated rPDG always yields a +2 monetary advantage regardless of the partner’s action. This positive value represents the additional subjective reward assigned to mutual cooperation (e.g., reciprocity value) that counterbalances the monetary gain from defection. Although the estimated social reward parameter ω was positive, the effective advantage of cooperation is Δ=p×ω−2. Given participants’ inferred beliefs p, Δ was negative for most trials (p×ω<2), indicating that the social reward was insufficient to offset the +2 advantage of defection. Thus, both adolescents and adults valued cooperation positively, but adolescents’ smaller ω and weaker responsiveness to sustained partner cooperation suggest a stronger weighting on immediate monetary payoffs.

      In this light, our framing of adolescents as more self-interested derives from their behavioral pattern: even when they recognized sustained partner cooperation and held high expectations of partner cooperation, adolescents showed lower cooperative behavior and reciprocity rewards compared with adults. Whereas adults increased cooperation after two or three consecutive partner cooperations, this pattern was absent among adolescents. We therefore interpret their behavior as relatively more self-interested, reflecting reduced sensitivity to the social reward from mutual cooperation rather than a categorical shift from self-interest to other-interest, as elaborated in the Discussion.

      (Q2) It is unclear why the authors assume adolescents and adults have the same expectations about the partner's cooperation, yet simultaneously demonstrate age-related differences in learning about the partner. To support their claim mechanistically, simulations showing that differences in cooperation preference (i.e., the w parameter), rather than differences in learning, drive behavioral differences would be helpful.

      We thank the reviewer for raising this important point. In our model, both adolescents and adults updated their beliefs about partner cooperation using an asymmetric reinforcement learning (RL) rule. Although adolescents exhibited a higher positive and a lower negative learning rate than adults, the two groups did not differ significantly in their overall updating of partner cooperation probability (Fig. 4a-b). We then examined the social reward parameter ω, which was significantly smaller in adolescents and determined the intrinsic value of mutual cooperation (i.e., p×ω). This variable differed significantly between groups and closely matched the behavioral pattern.

      Following the reviewer’s suggestion, we conducted additional simulations varying one model parameter at a time while holding the others constant. The difference in mean cooperation probability between adults and adolescents served as the index (positive = higher cooperation in adults). As shown in the Author response image 2, decreases in ω most effectively reproduced the observed group difference (shaded area), indicating that age-related differences in cooperation are primarily driven by variation in the social reward parameter ω rather than by others.

      Author response image 2.

      Simulation results showing how variations in each model parameter affect the group difference in mean cooperation probability (Adults – Adolescents). Based on the bestfitting Model 8 and parameters estimated from all participants, each line represents one parameter (i.e., α+, α-, ω, β) systematically varied within the tested range (α±:0.1–0.9; ω, β:1–9) while other parameters were held constant. Positive values indicate higher cooperation in adults. Smaller ω values most strongly reproduced the observed group difference, suggesting that reduced social reward weighting primarily drives adolescents’ lower cooperation.

      (Q3) Two different schedules of 120 trials were used: one with stable partner behavior and one with behavior changing after 20 trials. While results for order effects are reported, the results for the stable vs. changing phases within each schedule are not. Since learning is influenced by reward structure, it is important to test whether key findings hold across both phases.

      We thank the reviewer for this thoughtful and professional comment. In our GLMM and LMM analyses, we focused on trial order rather than explicitly including the stable vs. changing phase factor, due to concerns about multicollinearity. In our design, phases occur in specific temporal segments, which introduces strong collinearity with trial order. In multi-round interactions, order effects also capture variance related to phase transitions.

      Nonetheless, to directly address this concern, we conducted additional robustness analyses by adding a phase variable (stable vs. changing) to GLMM1, LMM1, and LMM3 alongside the original covariates. Across these specifications, the key findings were replicated (see GLMM<sub>sup</sub>2 and LMM<sub>sup</sub>4–5; Tables 9-11), and the direction and significance of main effects remained unchanged, indicating that our conclusions are robust to phase differences.

      (Q4) The division of participants at the legal threshold of 18 years should be more explicitly justified. The age distribution appears continuous rather than clearly split. Providing rationale and including continuous analyses would clarify how groupings were determined.

      We thank the reviewer for this thoughtful comment. We divided participants at the legal threshold of 18 years for both conceptual and practical reasons grounded in prior literature and policy. In many countries and regions, 18 marks the age of legal majority and is widely used as the boundary between adolescence and adulthood in behavioral and clinical research. Empirically, prior studies indicate that psychosocial maturity and executive functions approach adult levels around this age, with key cognitive capacities stabilizing in late adolescence (Icenogle et al., 2019; Tervo-Clemmens et al., 2023). We have clarified this rationale in the Introduction section of the revised manuscript.

      “Based on legal criteria for majority and prior empirical work, we adopt 18 years as the boundary between adolescence and adulthood (Icenogle et al., 2019; Tervo-Clemmens et al., 2023).”

      We fully agree that the underlying age distribution is continuous rather than sharply divided. To address this, we conducted additional analyses treating age as a continuous predictor (see GLMM<sub>sup</sub>1 and LMM<sub>sup</sub>1–3; Tables S1-S4), which generally replicated the patterns observed with the categorical grouping. Nevertheless, given the limited age range of our sample, the generalizability of these findings to fine-grained developmental differences remains constrained. Therefore, our primary analyses continue to focus on the contrast between adolescents and adults, rather than attempting to model a full developmental trajectory.

      (Q5) Claims of null effects (e.g., in the abstract: "adults increased their intrinsic reward for reciprocating... a pattern absent in adolescents") should be supported with appropriate statistics, such as Bayesian regression.

      We thank the reviewer for highlighting the importance of rigor when interpreting potential null effects. To address this concern, we conducted Bayes factor analyses of the intrinsic reward for reciprocity and reported the corresponding BF10 for all relevant post hoc comparisons. This approach quantifies the relative evidence for the alternative versus the null hypothesis, thereby providing a more direct assessment of null effects. The analysis procedure is now described in the Methods and Materials section:

      “Post hoc comparisons were conducted using Bayes factor analyses with MATLAB’s bayesFactor Toolbox (version v3.0, Krekelberg, 2024), with a Cauchy prior scale σ = 0.707.”

      (Q6) Once claims are more closely aligned with the data, the study will offer a valuable contribution to the field, given its use of relevant models and a well-established paradigm.

      We are grateful for the reviewer’s generous appraisal and insightful comments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I commend the authors on a well-structured, clear, and interesting piece of work. I have several questions and recommendations that, if addressed, I believe will strengthen the manuscript.

      We thank the reviewer for commending the organization of our paper.

      (2) Introduction: - Why use a zero-sum (Prisoner's Dilemma; PD) versus a mixed-motive game (e.g. Trust Task) to study cooperation? In a finite set of rounds, the dominant strategy can be to defect in a PD.

      We thank the reviewer for this helpful comment. We agree that both the rationale for using the repeated Prisoner’s Dilemma (rPDG) and the limitations of this framework should be clarified. We chose the rPDG to isolate the core motivational conflict between selfinterest and joint welfare, as its symmetric and simultaneous structure avoids the sequential trust and reputation dependencies/accumulation inherent to asymmetric tasks such as the Trust Game (King-Casas et al., 2005; Rilling et al., 2002).

      Although a finitely repeated rPDG theoretically favors defection, extensive prior research shows that cooperation can still emerge in long repeated interactions when players rely on learning and reciprocity rather than backward induction (Rilling et al., 2002; Fareri et al., 2015). Our design employed 120 consecutive rounds, allowing participants to update expectations about partner behavior and to establish stable reciprocity patterns over time. We have added the following clarification to the Introduction:

      “The rPDG provides a symmetric and simultaneous framework that isolates the motivational conflict between self-interest and joint welfare, avoiding the sequential trust and reputation dynamics characteristic of asymmetric tasks such as the Trust Game (Rilling et al., 2002; King-Casas et al., 2005)”

      (3) Methods:

      Did the participants know how long the PD would go on for?

      Were the participants informed that the partner was real/simulated?

      Were the participants informed that the partner was going to be the same for all rounds?

      We thank the reviewer for the meticulous review work, which helped us present the experimental design and reporting details more clearly. the following clarifications: I. Participants were not informed of the total number of rounds in the rPDG. This prevented endgame expectations and avoided distraction from counting rounds, which could introduce additional effects. II. Participants were told that their partner was another human participant in the laboratory. However, the partner’s behavior was predetermined by a computer program. This design enabled tighter experimental control and ensured consistent conditions across age groups, supporting valid comparisons. III. Participants were informed that they would interact with the same partner across all rounds, aligning with the essence of a multiround interaction paradigm and stabilizing partner-related expectations. For transparency, we have clarified these points in the Methods and Materials section:

      “Participants were told that their partner was another human participant in the laboratory and that they would interact with the same partner across all rounds. However, in reality, the actions of the partner were predetermined by a computer program. This setup allowed for a clear comparison of the behavioral responses between adolescents and adults. Participants were not informed of the total number of rounds in the rPDG.”

      (4) The authors mention that an SVO was also recorded to indicate participant prosociality. Where are the results of this? Did this track game play at all? Could cooperativeness be explained broadly as an SVO preference that penetrated into game-play behaviour?

      We thank the reviewer for pointing this out. We agree that individual differences in prosociality may shape cooperative behavior, so we conducted additional analyses incorporating SVO. Specifically, we extended GLMM1 and LMM3 by adding the measured SVO as a fixed effect with random slopes, yielding GLMM<sub>sup</sub>3 and LMM<sub>sup</sub>6 (Tables 12–13). The results showed that higher SVO was associated with greater cooperation, whereas its effect on the reward for reciprocity was not significant. Importantly, the primary findings remained unchanged after controlling for SVO. These results indicate that cooperativeness in our task cannot be explained solely by a broad SVO preference, although a more prosocial orientation was associated with greater cooperation. We have reported these analyses and results in the Appendix Analysis section.

      (5) Why was AIC chosen rather an BIC to compare model dominance?

      Sorry for the lack of clarification. Both the Akaike Information Criterion (AIC, Akaike, 1974) and Bayesian Information Criterion (BIC, Schwarz, 1978) are informationtheoretic criterions for model comparison, neither of which depends on whether the models to be compared are nested to each other or not (Burnham et al., 2002). We have added the following clarification into the Methods.

      “We chose to use the AICc as the metric of goodness-of-fit for model comparison for the following statistical reasons. First, BIC is derived based on the assumption that the “true model” must be one of the models in the limited model set one compares (Burnham et al., 2002; Gelman & Shalizi, 2013), which is unrealistic in our case. In contrast, AIC does not rely on this unrealistic “true model” assumption and instead selects out the model that has the highest predictive power in the model set (Gelman et al., 2014). Second, AIC is also more robust than BIC for finite sample size (Vrieze, 2012).”

      (6) I believe the model fitting procedure might benefit from hierarchical estimation, rather than maximum likelihood methods. Adolescents in particular seem to show multiple outliers in a^+ and w^+ at the lower end of the distributions in Figure S2. There are several packages to allow hierarchical estimation and model comparison in MATLAB (which I believe is the language used for this analysis;

      see https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007043).

      We thank the reviewer for this helpful comment and for referring us to relevant methodological work (Piray et al., 2019). We have addressed this point by incorporating hierarchical Bayesian estimation, which effectively mitigates outlier effects and improves model identifiability. The results replicated those obtained with MLE fitting and further revealed group-level differences in key parameters. Please see our detailed response to Reviewer#1 Q1 for the full description of this analysis and results.

      (7) Results: Model confusion seems to show that the inequality aversion and social reward models were consistently confused with the baseline model. Is this explained or investigated? I could not find an explanation for this.

      The apparent overlap between the inequality aversion (Model 4) and social reward (Model 5) models in the recovery analysis likely arises because neither model includes a learning mechanism, making them unable to capture trial-by-trial adjustments in this dynamic task. Consequently, both were best fit by the baseline model. Please see Response to Reviewer #1 Q3 for related discussion.

      (8) Figures 3e and 3f show the correlation between asymmetric learning rates and age. It seems that both a^+ and a^- are around 0.35-0.40 for young adolescents, and this becomes more polarised with age. Could it be that with age comes an increasing discernment of positive and negative outcomes on beliefs, and younger ages compress both positive and negative values together? Given the higher stochasticity in younger ages (\beta), it may also be that these values simply represent higher uncertainty over how to act in any given situation within a social context (assuming the differences in groups are true).

      We appreciate this insightful interpretation. Indeed, both α+ and α- cluster around 0.35–0.40 in younger adolescents and become increasingly polarized with age, suggesting that sensitivity to positive versus negative feedback is less differentiated early in development and becomes more distinct over time. This interpretation remains tentative and warrants further validation. Based on this comment, we have revised the Discussion to include this developmental interpretation.

      We also clarify that in our model β denotes the inverse temperature parameter; higher β reflects greater choice precision and value sensitivity, not higher stochasticity. Accordingly, adolescents showed higher β values, indicating more value-based and less exploratory choices, whereas adults displayed relatively greater exploratory cooperation. These group differences were also replicated using hierarchical Bayesian estimation (see Response to Reviewer #1 Q1). In response to this comment, we have added a statement in the Discussion highlighting this developmental interpretation.

      “Together, these findings suggest that the differentiation between positive and negative learning rates changes with age, reflecting more selective feedback sensitivity in development, while higher β values in adolescents indicate greater value sensitivity. This interpretation remains tentative and requires further validation in future research.”

      (9) A parameter partial correlation matrix (off-diagonal) would be helpful to understand the relationship between parameters in both adolescents and adults separately. This may provide a good overview of how the model properties may change with age (e.g. a^+'s relation to \beta).

      We thank the reviewer for this helpful comment. We fully agree that a parameter partial correlation matrix can further elucidate the relationships among parameters. Accordingly, we conducted a partial correlation analysis and added the visually presented results to the revised manuscript as Figure 2-figure supplement 4.

      (10) It would be helpful to have Bayes Factors reported with each statistical tests given that several p-values fall within the 0.01 and 0.10.

      We thank the reviewer for this important recommendation. We have conducted Bayes factor analyses and reported BF10 for all relevant post hoc comparisons. We also clarified our analysis in the Methods and Materials section:

      “Post hoc comparisons were conducted using Bayes factor analyses with MATLAB’s bayesFactor Toolbox (version v3.0, Krekelberg, 2024), with a Cauchy prior scale σ = 0.707.”

      (11) Discussion: I believe the language around ruling out failures in mentalising needs to be toned down. RL models do not enable formal representational differences required to assess mentalising, but they can distinguish biases in value learning, which in itself is interesting. If the authors were to show that more complex 'ToM-like' Bayesian models were beaten by RL models across the board, and this did not differ across adults and adolescents, there would be a stronger case to make this claim. I think the authors either need to include Bayesian models in their comparison, or tone down their language on this point, and/or suggest ways in which this point might be more thoroughly investigated (e.g., using structured models on the same task and running comparisons: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0087619).

      We thank the reviewer for the comments. Please see our response to Reviewer 1 (Appraisal & Discussion section) for details.

      Reviewer #2 (Recommendations for the authors):

      (1) The authors may want to show the winning model earlier (perhaps near the beginning of the Results section, when model parameters are first mentioned).

      We thank the reviewer for this suggestion. We agree that highlighting the winning model early improves clarity. Currently, we have mentioned the winning model before the beginning of the Results section. Specifically, in the penultimate paragraph of the Introduction we state:

      “We identified the asymmetric RL learning model as the winning model that best explained the cooperative decisions of both adolescents and adults.”

      Reviewer #3 (Recommendations for the authors):

      (1) In addition to the points mentioned above, I suggest the following:

      Clarify plots by clearly explaining each variable. In particular, the indices 1 vs. 1,2 vs 1,2,3 were not immediately understandable.

      We thank the reviewer for this suggestion. We agree that the indices were not immediately clear. We have revised the figure captions (Figure 1 and 4) to explicitly define these terms more clearly:

      “The x-axis represents the consistency of the partner’s actions in previous trials (t<sub>−1</sub>: last trial; t<sub>−1,2</sub>: last two trials;<sub>t−1,2,3</sub>: last three trials).”

      (2) It's unclear why the index stops at 3. If this isn't the maximum possible number of consecutive cooperation trials, please consider including all relevant data, as adolescents might show a trend similar to adults over more trials.

      We thank the reviewer for raising this point. In our exploratory analyses, we also examined longer streaks of consecutive partner cooperation or defection (up to four or five trials). Two empirical considerations led us to set the cutoff at three in the final analyses. First, the influence of partner behavior diminished sharply with temporal distance. In both GLMMs and LMMs, coefficients for earlier partner choices were small and unstable, and their inclusion substantially increased model complexity and multicollinearity. This recency pattern is consistent with learning and decision models emphasizing stronger weighting of recent evidence (Fudenberg & Levine, 2014; Fudenberg & Peysakhovich, 2016). Second, streaks longer than three were rare, especially among some participants, leading to data sparsity and inflated uncertainty. Including these sparse conditions risked biasing group estimates rather than clarifying them. Balancing informativeness and stability, we therefore restricted the index to three consecutive partner choices in the main analyses, which we believe sufficiently capture individuals’ general tendencies in reciprocal cooperation.

      (3) The term "reciprocity" may not be necessary. Since it appears to reflect a general preference for cooperation, it may be clearer to refer to the specific behavior or parameter being measured. This would also avoid confusion, especially since adolescents do show negative reciprocity in response to repeated defection.

      We thank you for this comment. In our work, we compute the intrinsic reward for reciprocity as p × ω, where p is the partner cooperation expectation and ω is the cooperation preference. In the rPDG, this value framework manifests as a reciprocity-derived reward: sustained mutual cooperation maximizes joint benefits, and the resulting choice pattern reflects a value for reciprocity, contingent on the expected cooperation of the partner. This quantity enters the trade-off between U<sub>cooperation</sub> and U<sub>defection</sub> and captures the participant’s intrinsic reward for reciprocity versus the additional monetary reward payoff of defection. Therefore, we consider the term “reciprocity” an acceptable statement for this construct.

      (4) Interpretation of parameters should closely reflect what they specifically measure.

      We thank the reviewer for pointing this out. We have refined the relevant interpretations of parameters in the current Results and Discussion sections.

      (5) Prior research has shown links between Theory of Mind (ToM) and cooperation (e.g., Martínez-Velázquez et al., 2024). It would be valuable to test whether this also holds in your dataset.

      We thank the reviewer for this thoughtful comment. Although we did not directly measure participants’ ToM, our design allowed us to estimate participants’ trial-by-trial inferences (i.e., expectations) about their partner’s cooperation probability. We therefore treat these cooperation expectations as an indirect representation for belief inference, which is related to ToM processes. To test whether this belief-inference component relates to cooperation in our dataset, we further conducted an exploratory analysis (GLMM<sub>sup</sub>4) in which participants’ choices were regressed on their cooperation expectations, group, and the group × cooperation-expectation interaction, controlling for trial number and gender, with random effects. Consistent with the ToM–cooperation link in prior research (MartínezVelázquez et al., 2024), participants’ expectations about their partner’s cooperation significantly predicted their cooperative behavior (Table 14), suggesting that decisions were shaped by social learning about others’ inferred actions. Moreover, the interaction between group and cooperation expectation was not significant, indicating that this inference-driven social learning process likely operates similarly in adolescents and adults. This aligns with our primary modeling results showing that both age groups update beliefs via an asymmetric learning process. We have reported these analyses in the Appendix Analysis section.

      (6) More informative table captions would help the reader. Please clarify how variables are coded (e.g., is female = 0 or 1? Is adolescent = 0 or 1?), to avoid the need to search across the manuscript for this information.

      We thank the reviewer for raising this point. We have added clear and standardized variable coding in the table notes of all tables to make them more informative and avoid the need to search the paper. We have ensured consistent wording and formatting across all tables.

      (7) I hope these comments are helpful and support the authors in further strengthening their manuscript.

      We thank the three reviewers for their comments, which have been helpful in strengthening this work.

      References

      (1) Fudenberg, D., & Levine, D. K. (2014). Recency, consistent learning, and Nash equilibrium. Proceedings of the National Academy of Sciences of the United States of America, 111(Suppl. 3), 10826–10829. https://doi.org/10.1073/pnas.1400987111.

      (2) Fudenberg, D., & Peysakhovich, A. (2016). Recency, records, and recaps: Learning and nonequilibrium behavior in a simple decision problem. ACM Transactions on Economics and Computation, 4(4), Article 23, 1–18. https://doi.org/10.1145/2956581

      (3) Hackel, L., Doll, B., & Amodio, D. (2015). Instrumental learning of traits versus rewards: Dissociable neural correlates and effects on choice. Nature Neuroscience, 18, 1233– 1235. https://doi.org/10.1038/nn.4080

      (4) Icenogle, G., Steinberg, L., Duell, N., Chein, J., Chang, L., Chaudhary, N., Di Giunta, L., Dodge, K. A., Fanti, K. A., Lansford, J. E., Oburu, P., Pastorelli, C., Skinner, A. T.Sorbring, E., Tapanya, S., Uribe Tirado, L. M., Alampay, L. P., Al-Hassan, S. M.,Takash, H. M. S., & Bacchini, D. (2019). Adolescents’ cognitive capacity reaches adult levels prior to their psychosocial maturity: Evidence for a “maturity gap” in a multinational, cross-sectional sample. Law and Human Behavior, 43(1), 69–85. https://doi.org/10.1037/lhb0000315

      (5) Krekelberg, B. (2024). Matlab Toolbox for Bayes Factor Analysis (v3.0) [Computer software]. Zenodo. https://doi.org/10.5281/zenodo.13744717

      (6) Martínez-Velázquez, E. S., Ponce-Juárez, S. P., Díaz Furlong, A., & Sequeira, H. (2024). Cooperative behavior in adolescents: A contribution of empathy and emotional regulation? Frontiers in Psychology, 15,1342458. https://doi.org/10.3389/fpsyg.2024.1342458

      (7) Tervo-Clemmens, B., Calabro, F. J., Parr, A. C., et al. (2023). A canonical trajectory of executive function maturation from adolescence to adulthood. Nature Communications, 14, 6922. https://doi.org/10.1038/s41467-023-42540-8

      (8) King-Casas, B., Tomlin, D., Anen, C., Camerer, C. F., Quartz, S. R., & Montague, P. R. (2005). Getting to know you: reputation and trust in a two-person economic exchange. Science, 308(5718), 78-83. https://doi.org/10.1126/science.1108062

      (9) Rilling, J. K., Gutman, D. A., Zeh, T. R., Pagnoni, G., Berns, G. S., & Kilts, C. D. (2002).A neural basis for social cooperation. Neuron, 35(2), 395-405. https://doi.org/10.1016/s0896-6273(02)00755-9

      (10) Fareri, D. S., Chang, L. J., & Delgado, M. R. (2015). Computational substrates of social value in interpersonal collaboration. Journal of Neuroscience, 35(21), 8170-8180. https://doi.org/10.1523/JNEUROSCI.4775-14.2015

      (11) Akaike, H. (2003). A new look at the statistical model identification. IEEE transactions on automatic control, 19(6), 716-723. https://doi.org/10.1109/TAC.1974.1100705

      (12) Schwarz, G. (1978). Estimating the dimension of a model. The annals of statistics, 461464. https://doi.org/10.1214/aos/1176344136

      (13) Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference: A practical information-theoretic approach (2nd ed.). Springer.https://doi.org/10.1007/b97636

      (14) Gelman, A., & Shalizi, C. R. (2013). Philosophy and the practice of Bayesian statistics. British Journal of Mathematical and Statistical Psychology, 66(1), 8–38. https://doi.org/10.1111/j.2044-8317.2011.02037.x

      (15) Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2014). Bayesian data analysis (3rd ed.). Chapman and Hall/CRC. https://doi.org/10.1201/b16018

      (16) Vrieze, S. I. (2012). Model selection and psychological theory: A discussion of the differences between the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC). Psychological Methods, 17(2), 228–243. https://doi.org/10.1037/a0027127

    1. Guide de Référence : Le Programme Google Ad Grants pour les Associations

      Résumé Exécutif

      Le programme Google Ad Grants offre aux associations de loi 1901 une enveloppe de publicité gratuite sur le moteur de recherche Google s'élevant à 10 000 dollars par mois.

      Malgré son potentiel massif pour accroître la notoriété, recruter des bénévoles ou collecter des fonds, ce programme reste largement sous-exploité en France, avec seulement 2 000 à 3 000 associations actives sur les millions existantes.

      Ce document détaille les mécanismes du référencement payant, les critères d'éligibilité technique pour les structures, le processus d'activation en quatre étapes, ainsi que les stratégies optimales pour structurer des campagnes performantes.

      Il souligne également les limites du programme, notamment la priorité donnée aux annonceurs payants et la nécessité d'une gestion rigoureuse pour maximiser l'impact du crédit quotidien de 329 dollars.

      --------------------------------------------------------------------------------

      1. Fondamentaux du Référencement Payant (SEA)

      Le programme Google Ad Grants s'inscrit dans le cadre du référencement payant (SEA), qu'il convient de distinguer du référencement naturel (SEO).

      Différences Clés : SEA vs SEO

      | Caractéristique | Référencement Payant (SEA) | Référencement Naturel (SEO) | | --- | --- | --- | | Position | Haut de la page (résultats sponsorisés) | Sous les annonces sponsorisées | | Délai de résultat | Court terme (immédiat après lancement) | Long terme et incertain | | Coût | Paiement au clic (offert par Ad Grants) | "Gratuit" (nécessite du temps/contenu) | | Contrôle | Choix précis des mots-clés et zones | Dépend de l'algorithme de Google |

      Spécificités du Compte Ad Grants

      Contrairement à un compte Google Ads classique, le compte Ad Grants présente des particularités :

      Enveloppe virtuelle : Aucun budget réel n'est déboursé par l'association ; Google déduit les frais de l'enveloppe de 10 000 $.

      Hiérarchie de diffusion : Les annonces Ad Grants apparaissent en dessous des annonces payantes des entreprises privées ou des institutions disposant d'un budget marketing.

      En cas de forte concurrence (ex: "collecte de dons"), il est parfois impossible de diffuser si les espaces publicitaires sont déjà saturés par des annonceurs payants.

      --------------------------------------------------------------------------------

      2. Éligibilité et Critères Techniques

      Pour bénéficier du programme, une organisation doit remplir des critères statutaires et techniques précis.

      Structures Éligibles

      • Associations loi 1901.

      • Fonds de dotation et fondations reconnues d'utilité publique.

      Exclusions : Les entités gouvernementales, les hôpitaux, les centres de soins et les écoles ne sont pas éligibles directement (sauf via une fondation ou une structure associative dédiée).

      Exigences pour le Site Web

      Google effectue une vérification manuelle du site Internet lors de la demande. Celui-ci doit présenter :

      1. Un nom de domaine propre : Les sites hébergés sur des sous-domaines gratuits (ex: .wix.com, .google.site) sont refusés.

      2. Un contenu substantiel : Un minimum de 5 pages est requis.

      3. Une clarté institutionnelle : La mission et le statut associatif doivent être mentionnés en page d'accueil, dans une page "À propos" et dans le pied de page (footer).

      4. Performance technique : Le site doit être "responsive" (adapté aux mobiles) et avoir une vitesse de chargement satisfaisante (idéalement un score > 50/100 sur PageSpeed Insights).

      --------------------------------------------------------------------------------

      3. Processus d'Activation en 4 Étapes

      Le lancement d'un compte Ad Grants suit un parcours structuré :

      1. Création du compte Google pour les associations : Utiliser de préférence une adresse email professionnelle liée au domaine de l'association pour simplifier la validation.

      2. Validation de l'identité : Google vérifie le statut juridique de l'association (via le numéro RNA). Cette étape prend généralement 24 heures.

      3. Activation de Google Ad Grants : Soumission du site web pour examen des critères de contenu et de performance. Le délai varie de 2 à 14 jours.

      4. Configuration finale : Validation du profil de paiement (sans carte bancaire) et accès définitif au compte.

      --------------------------------------------------------------------------------

      4. Stratégie et Structuration des Campagnes

      Une gestion efficace repose sur une structure logique et l'alignement entre l'intention de l'utilisateur et le contenu proposé.

      Les 4 Piliers du Succès

      Le Ciblage : Sélection de mots-clés pertinents (volume de recherche > 50/mois) et spécifiques à la cause, en évitant les termes trop génériques ou ultra-concurrentiels.

      Les Annonces : Rédaction de titres percutants (jusqu'à 15 variantes) qui reprennent les mots-clés tapés par l'utilisateur.

      Les Enchères : Utilisation impérative de la stratégie "Maximiser les conversions" pour permettre à l'algorithme de Google d'optimiser la diffusion.

      Le Tracking : Connexion indispensable avec Google Analytics pour mesurer les actions concrètes (dons, inscriptions bénévoles, téléchargements).

      Exemple de Structure de Compte (Cas d'un refuge animalier)

      Campagne Adoptions : Groupes d'annonces séparés pour "Adopter un chien" et "Adopter un chat" renvoyant vers les pages respectives du site.

      Campagne Bénévolat : Mots-clés sur le don de temps, le soin aux animaux ou le travail associatif.

      Campagne Marque : Protection du nom de l'association pour apparaître systématiquement en haut lors d'une recherche directe.

      --------------------------------------------------------------------------------

      5. Outils et Maintenance

      Le maintien de la performance nécessite l'usage d'outils complémentaires et une surveillance régulière.

      | Outil | Utilité | Niveau de difficulté | | --- | --- | --- | | Google Keyword Planner | Trouver des mots-clés et analyser leur volume/concurrence. | Débutant | | IA (ChatGPT, Gemini) | Aide à la rédaction des titres et descriptions d'annonces. | Débutant | | Google Analytics | Analyser le comportement des visiteurs après le clic. | Intermédiaire | | Google Tag Manager | Installer des marqueurs de conversion précis sans code. | Avancé |

      Conseils de Gestion

      Ne jamais supprimer de campagne : Il est préférable de mettre en pause les campagnes inactives pour conserver l'historique et gagner du temps lors de la réactivation.

      Utilisation du budget : Le plafond de 10 000 $ est réparti à hauteur de 329 $ par jour. Les crédits non utilisés un jour donné sont définitivement perdus et ne sont pas reportables.

      Sécurité des accès : Il est crucial de nommer plusieurs administrateurs pour éviter la perte du compte en cas de départ d'un collaborateur ou d'un bénévole.

      --------------------------------------------------------------------------------

      6. Éthique et Transparence

      Bien que les annonces soient financées par Google, elles portent la mention "Sponsorisé".

      Cette transparence est renforcée par le Google Ads Transparency Center, qui permet au public de consulter les publicités diffusées par n'importe quelle entité.

      Le programme s'inscrit dans la politique de Responsabilité Sociétale des Entreprises (RSE) de Google, agissant comme un don en nature sous forme d'espace publicitaire.

    1. AbstractCancer cells are heterogeneous, each harboring distinct molecular aberrations and are dependent on different genes for their survival and proliferation. While successful targeted therapies have been developed based on driver DNA mutations, many patient tumors lack druggable mutations and have limited treatment options. Here, we hypothesize that new precision oncology targets may be identified through “expression-driven dependency”, whereby cancer cells with high expression of a targeted gene are more vulnerable to the knockout of that gene. We introduce a Bayesian approach, BEACON, to identify such targets by jointly analyzing global transcriptomic and proteomic profiles with genetic dependency data of cancer cell lines across 17 tissue lineages. BEACON identifies known druggable genes, e.g., BCL2, ERBB2, EGFR, ESR1, MYC, while revealing new targets confirmed by both mRNA- and protein-expression driven dependency. Notably, the identified genes show an overall 3.8-fold enrichment for approved drug targets and enrich for druggable oncology targets by 7 to 10-fold. We experimentally validate that the depletion of GRHL2, TP63, and PAX5 effectively reduce tumor cell growth and survival in their dependent cells. Overall, we present the catalog of express-driven dependency targets as a resource for identifying novel therapeutic targets in precision oncology.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giag011), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 4

      Reproducibility report for: Expression-Driven Genetic Dependency Reveals Targets for Precision Oncology Journal: Gigascience ID number/DOI: GIGA-D-25-00147 Reviewer(s): Laura Caquelin, Department of Clinical Neuroscience, Karolinska Institutet, Sweden


      1. Summary of the Study The authors developed a Bayesian method called BEACON to integrate multi-omics data. The method was tested on cancer cell lines across 17 tissue types to identify expression- driven dependencies. The method recovered known drug targets and identified novel candidates. The study concludes this method provides a systematic approach to identify precision oncology targets.

      1. Scope of reproducibility According to our assessment the primary objective is: to identify expression-driven dependencies across cancer cell lines from multiple lineages enabling the discovery of genes whose expression levels correlate with cancer cell dependency scores.
      2. Outcome: Identification of genes with significant expression-driven dependencies across pan-lineage cancer cell lines.
      3. Analysis method outcome: "BEACON calculated the Bayesian correlation between the gene's expressions and CERES cancer dependency scores 25 across the pan-lineage cell lines. BEACON modeled expression levels and dependency scores as the bivariate Gaussians and used Markov Chain Monte Carlo (MCMC) sampling to estimate the correlation coefficient rho between them. Given the null hypothesis that the uncorrelated expression and dependency of a gene has the 0 rho coefficient, we statistically tested each gene's rho estimate obtained from the MCMC simulation as follows. Assume that the MCMC sampling is carried out for a null gene's expression and dependency, then we expect that the distribution of the rho estimate accumulated over the MCMC iterations will be centered at zero. Based on this rationale, we computed the z-score of i-th gene as the deviation of the MCMC estimate of rho from the expected (null) value (i.e., zero) in terms of the standard deviation observed in the simulated distribution, i.e., z(i) = rhoMCMC(i) / SDMCMC(i). Since the z-values, by nature, follow a normal distribution with zero-mean and unit-variance, then we computed the p- value for each gene's rho estimate as the probability of observing a value as extreme as the computed z-value for that gene. We multi-testing corrected the resulting p-values using the BH procedure for FDR." (page 19 -Methods section / mRNA expression-driven dependency (GED))
      4. Main result: "We first analyzed the pan-lineage GED by using mRNA levels and the corresponding dependency scores from 854 cell lines with available data across 17 lineages and identified 244 genes showing significant association (correlation coefficient, rho < -0.25, FDR < 0.05)" (page 7 - Results section / Cancer vulnerability targets showing gene expression-driven dependency (GED))

      1. Availability of Materials a. Data
      2. Data availability: Open
      3. Data completeness: Complete, all data necessary to reproduce main results are available.
      4. Access Method: Repository
      5. Repository: https://doi.org/10.6084/m9.figshare.19700056.v2 -Data quality: Structured

      b. Code - Code availability: Open - Programming Language(s): R - Repository link: https://github.com/Huang-lab/BEACON - License: MIT license - Repository status: Public - Documentation: Readme file


      1. Computational environment of reproduction analysis
      2. Operating system for reproduction: MacOS 15.5
      3. Programming Language(s): R
      4. Code implementation approach: Using shared code
      5. Version environment for reproduction: R version 4.5.0/RStudio 2025.05.1

      1. Results 5.1 Original study results
      2. Results 1: Supplementary table S2 5.2 Steps for reproduction -> Run the code PanLineageMCMC.R
      3. Issue 1: File import paths and incorrect file name -- Resolved: In the original code, there were fixed file paths that only worked on one specific computer. This caused problems when running the code on other computers. To fix this, I recommended to use relative paths, which are based on where the script is located. This way, the code can be run on any computer without needing to change the paths each time.

      ------------------ Start of script ------------------ sam.dep = read.csv(file.path(getwd(), "DepMap_data", "sample_info.csv")) ------------------- End of script -------------------

      • Issue 2: Missing function "intsect" at line 162 -- Resolved: The script called a function intsect that was not defined, leading to an error. Upon request, the authors provided the missing function and added it to the main script (PanLineageMCMC.R).

      • Issue 3: Output directory not created. -- Resolved: The script attempted to write output files to a directory that was not created beforehand. This caused errors during the loop execution when trying to save results. A directory check and automatic creation script was added. If the output folder does not exist, it is now created automatically before the loop runs.

      ------------------ Start of script ------------------ dir_path <- paste0('../out/jags.nadapt',n.adapt,'.update',n.update,'.mcmc ',n.iter,'.simulation_SD_22Q2') if (!dir.exists(dir_path)) { dir.create(dir_path, recursive = TRUE) } ------------------- End of script -------------------

      5.3 Statistical comparison Original vs Reproduced results - Results: Table.mRNA.dependency.Bayesian.pancancer file attached - Comments: The Bayesian PanCancer analysis was re-run, but only on the 244 significant genes listed in Supplementary Table S2, not on the full set of 17 285 genes. This choice was made due to limited computational resources, as running the full model would have required an estimated 100 hours. - Errors detected: - - Statistical Consistency: Among the 244 significant genes originally reported, the reproduced analysis confirmed the statistical significance of these same genes. However, the exact numerical values (Mean, standard deviation, Z value, P-value and adjusted P-value) differed slightly. These discrepancies are expected due to the nature of Bayesian inference, the absence of a random seed, and the relatively low number of MCMC iterations used (n.iter = 500). These settings may not be sufficient to ensure full convergence or reproducibility of posterior estimates and should be interpreted with caution. We were unable to compare the rho values because they were not available in the provided Supplementary table S2, nor extracted in the R code to be include in the resulting output files.


      1. Conclusion
      2. Summary of the computational reproducibility review The results of the Supplementary table S2 in the original study was partially reproduced. We were able to confirm the statistical significance of the 244 genes reported in Supplementary Table S2 using the Bayesian PanCancer model in the provided code. However, the numerical results were not always identical. This is expected because Bayesian methods involve random sampling, the original code did not set a fixed random seed, and the number of iterations used was relatively low. Furthermore, the rho values were not available for comparison, limiting a full reproducibility assessment. Several technical issues were also fixed during the reproduction process, such as hardcoded file paths, a missing function, and the absence of output directories, which were resolved to allow the code to run correctly on a different system. Due to computational limitations, running the full model on all 17,285 genes was not performed.

      3. Recommendations for authors While the original analysis code was successfully used to confirm the statistical significance of the 244 genes, we recommend several improvements to enhance reproducibility: -- Code annotation: Adding more detailed comments within the scripts would help users understand the logic behind each step and the purpose of specific commands or operations. -- Set a random seed: Include set.seed() in all scripts to improve reproducibility across different runs. -- Specify R and package versions: Provide the R version and exact package versions needed to run the code, via a requirements file for example. -- Use relative file paths: Ensure that all necessary folders and functions are created or included by default to avoid path issues. -- Increase MCMC robustness: Use a higher number of iterations and appropriate parameter settings to ensure better convergence and stability of posterior estimates. -- Inform users about computation time: Clearly indicate in the README or publication the expected runtime of the code, especially if it requires several hours or days to complete.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers and editors for their careful evaluation of our manuscript and their positive comments on the importance and rigor of the work. Below you will find our point-by-point response to each reviewer's suggestions. We believe that we have addressed (in the response and the revised manuscript) all of the concerns. Please note that in some cases, we have numbered a reviewer's comments for clarity, however beyond this, we have not altered any of the reviewers' text.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Lo et al., report a high-throughput functional profiling study on the gene encoding for argininosuccinate synthase (ASS1), done in a yeast experimental system. The study design is robust (see lines 141-143, main text, Methods), whereby "approximately three to four independent transformants of each variant would be isolated and assayed." (lines 140 - 141, main text, Methods). Such a manner of analysis will allow for uncertainty of the functional readout for the tested variants to be accounted for.

      This is an outstanding study providing insights on the functional landscape of ASS1. Functionally impaired ASS1 may cause citrullinemia type I, and disease severity varies according to the degree of enzyme impairment (line 30, main text; Abstract). Data from this study forms a valuable resource in allowing for functional interpretation of protein-altering ASS1 variants that could be newly identified from large-scale whole-genome sequencing efforts done in biobanks or national precision medicine programs. I have some suggestions for the Authors to consider:

      1. The specific function of ASS1 is to condense L-citrulline and L-aspartate to form argininosuccinate. Instead of measuring either depletion of substrate or formation of product, the Authors elected to study 'growth' of the yeast cells. This is a broader phenotype which could be determined by other factors outside of ASS1. Whereas i agree that the experiments were beautifully done, the selection of an indirect phenotype such as ability of the yeast cells to grow could be more vigorously discussed.

      We appreciate the reviewer's point regarding the indirect nature of growth as a functional readout. In our system, yeast growth is tightly and specifically coupled to ASS enzymatic activity. The strains used are isogenic and lack the native yeast argininosuccinate synthetase, such that arginine biosynthesis, and therefore yeast replication on minimal medium lacking arginine, depends exclusively on the activity of human ASS1. Under these defined and limiting conditions, growth provides a quantitative proxy for ASS1 function. However, we acknowledge that this assay does not resolve specific molecular mechanisms underlying reduced function, such as altered catalytic activity versus effects on protein stability. We have updated the text to clarify these points.

      "While growth is an indirect phenotype relative to direct measurement of substrate turnover or product formation, it is tightly coupled to ASS enzymatic activity in this system and is expected to be impaired by amino acid substitutions that reduce catalytic activity or protein stability. Therefore, growth on minimal medium lacking arginine is a quantitative measure of ASS enzyme function, allowing the impact of ASS1 missense variants to be assessed at scale through a high-throughput growth assay, in a single isogenic strain background, under controlled, defined conditions that limit confounding factors unrelated to ASS1 activity. We expect that the assay will detect reductions in both catalytic activity and protein stability but will not distinguish between these mechanisms."

      1. One of the key reasons why studies such as this one are valuable is due to the limitations of current variant classification methods that rely on 'conservation' status of amino acid residues to predict which variants might be 'pathogenic' and which variants might be 'likely benign'. However, there are serious limitations, and Figures 2 and 6 in the main text shows this clearly. Specifically, there is an appreciable number of variants that, despite being classified as "ClinVar Pathogenic", were shown by the assay to unlikely be functionally impaired. This should be discussed vigorously. Could these inconsistencies be potentially due to the read out (growth instead of a more direct evaluation of ASS1 function)?

      We interpret this discrepancy as reflecting a sensitivity limitation of the growth-based readout rather than a fundamental disagreement between functional effect and clinical annotation. Specifically, we believe that our assay is unable to resolve the very mildest hypomorphic variants from true wild type, i.e., the residual activity of these variants is sufficient to fully support yeast growth under the conditions used. On this basis, we have chosen not to treat wild-type-like growth in our assay as informative for benignity; conversely, reduced growth provides evidence supporting pathogenicity (all clinically validated variants examined in this range are pathogenic).

      We have revised the manuscript to clarify this point explicitly and to frame these variants as lying outside the effective resolution limit of the assay rather than representing true false positives. Additional discussion of this limitation and its implications is provided in our responses to Reviewer 2 (points 1 and 4) along with specific changes made to the text.

      1. Figure 3 is very interesting, showing a continuum of functional readout ranging from 'wild-type' to 'null'. It is very interesting that the Authors used a threshold of less than 0.85 as functionally hypomorphic. What does this mean? It would be very nice if they have data from patients carrying two hypomorphic ASS1 alleles, and correlate their functional readout with severity of clinical presentation. The reader might be curious as to the clinical presentation of individuals carrying, for example, two ASS1 alleles with normalized growth of 0.7 to 0.8.

      I hope you will find these suggestions helpful.

      We thank the reviewer for this thoughtful comment. Figure 3 indeed illustrates a continuum of functional effects, and we agree that careful interpretation of the thresholds used is important. To clarify the rationale for the hypomorphic threshold, the interpretation of intermediate growth values, and to emphasize that these labels reflect only behavior in the functional assay, we have rewritten the relevant section of the Results:

      "The normalized growth scores of the 2,193 variants tested in our functional assay form a clear bimodal distribution (Figure 3), with two distinct peaks corresponding to functional extremes, as is commonly reported in large-scale functional assays of protein function [9, 10]. The smaller peak, centered around the null control (normalized growth = 0), represents variants that fail to support growth in the assay (growth 0.85). Variants with growth values falling between these two peak-based thresholds display partial functional impairment and are classified as functionally hypomorphic (n = 323). Crucially, these classifications are entirely derived from the observed peaks in the distribution of growth values and reflect differences in functional activity under the assay conditions. They do not provide direct evidence for clinical pathogenicity or benignity and should not be used for clinical variant interpretation without proper benchmarking against clinical reference datasets, as implemented below within an OddsPath framework."

      We agree with the reviewer that correlating functional measurements with clinical severity in individuals carrying two hypomorphic ASS1 alleles would be highly informative, particularly given that ASS1 deficiency is an autosomal recessive disorder. While mild hypomorphic variants (for example, variants with normalized growth values of 0.7-0.8 in our assay) could plausibly contribute to disease when paired with a complete loss-of-function allele, systematic analysis of combinatorial genotype effects and genotype-phenotype correlations is beyond the scope of the present study, which focuses on the functional effects of individual variants. We view this as an important direction for future work.

      Reviewer #1 (Significance (Required)):

      This is an outstanding study providing insights on the functional landscape of ASS1. Functionally impaired ASS1 may cause citrullinemia type I, and disease severity varies according to the degree of enzyme impairment (line 30, main text; Abstract). Data from this study forms a valuable resource in allowing for functional interpretation of protein-altering ASS1 variants that could be newly identified from large-scale whole-genome sequencing efforts done in biobanks or national precision medicine programs.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, Lo et al characterize the phenotypic effect of ~90% of all possible ASS1 missense mutations using an elegant yeast-based system, and use this dataset to aid the interpretation of clinical ASS1 variants. Overall, the manuscript is well-written and the experimental data are interpretated rigorously. Of particular interest is the identification of pairs of deleterious alleles that rescue ASS1 activity in trans. My comments mainly pertain to the relevance of using a yeast screening methodology to infer functional effects of human ASS1 mutations.

      1. Since human ASS1 is heterologously expressed in yeast for this mutational screen, direct comparison of native expression levels between human cells and yeast is not possible. Could the expression level of human ASS1 (driven by the pARG1 promoter) in yeast alter the measured fitness defect of each variant? For instance, if ASS1 expression in yeast is sufficiently high to mask modest reductions in catalytic activity, such variants may be misclassified as hypomorphic rather than amorphic. Conversely, if expression is intrinsically low, even mild catalytic impairments could appear deleterious. While it is helpful that the authors used non-human primate SNV data to calibrate their assay, experiments could be performed to directly address this possibility.

      The nature of the relationship between yeast growth and availability of functional ASS1 could also influence the interpretation of results from the yeast-based screen. Does yeast growth scale proportionately with ASS1 enzymatic activity?

      We completely agree that the expression level of human ASS1 in yeast could influence the measured fitness effects of individual variants. We expect the rank ordering of variants in our growth assay to reflect their relative enzymatic activity (i.e. a monotonic relationship) but acknowledge that the precise mapping between activity and growth is unknown and may include ceiling and floor effects that limit the assay's dynamic range. As the reviewer notes, under high expression conditions moderate loss-of-function variants could appear indistinguishable from wild type (ceiling effect), whereas under lower expression the same variants could behave closer to the null control (floor effect).

      In our system, ASS1 is expressed from the pARG1 promoter, chosen under the assumption that the native expression level of ARG1 (the yeast ASS1 ortholog) is appropriately tuned for yeast growth. Crucially, rather than assuming a fixed mapping from assay growth to clinical pathogenicity (given potential nonlinearities in the relationship between ASS function and growth) we benchmark the assay against external data, including known pathogenic and benign variants and non-human primate SNVs, to calibrate thresholds and guide interpretation within an OddsPath framework. This benchmarking indicates that ceiling effects are likely present, with some mild loss-of-function pathogenic variants appearing indistinguishable from wild type in the growth assay. We explicitly account for this by not using high-growth scores as evidence toward benignity. We have made the following changes the manuscript:

      "A subset of clinically pathogenic ASS1 variants exhibit near-wild-type growth in our yeast assay. In general, we expect a monotonic relationship between ASS function and yeast growth, but with the potential for floor and ceiling effects that constrain the assay's dynamic range. In this context, we interpret high-growth pathogenic variants as likely causing mild loss of function that cannot be distinguished from wild type in our assay"

      "Based on these findings and given that 22/56 pathogenic variants show >85% growth, we conclude that growth above this threshold should not be used as evidence toward benignity."

      1. It would be helpful to add an additional diagram to Figure 1A explaining how the screen was performed, in particular: when genotype and phenotype were measured, relative to plating on selective vs non-selective media? This is described in "Variant library sequence confirmation" and "Measuring the growth of individual isolates" of the Methods section but could also be distilled into a diagram.

      We thank the reviewer for this helpful suggestion. We have updated Figure 1 by adding a new schematic panel (Figure 1C) that distills the experimental workflow into a visual overview. This diagram is intended to complement the detailed descriptions in the Methods and improve clarity for the reader.

      1. The authors rationalize the biochemical consequences of ASS1 mutations in the context of ASS1 per se - for example, mutations in the active site pocket impair substrate binding and therefore catalytic activity, which is expected. Does ASS1 physically interact with other proteins in human cells, and could these interactions be altered in the presence of specific ASS1 mutations? Such effects may not be captured by performing mutational scanning in yeast.

      We are not aware of any specific protein-protein interactions involving ASS that are required for its enzymatic function. However, we agree that ASS could engage in non-essential interactions with other human proteins that might be altered by specific missense variants and that such interactions would not necessarily be captured in a yeast-based assay.

      Importantly, our complementation system depends on human ASS providing the essential enzymatic activity required for arginine biosynthesis in yeast. If ASS1 required obligate human-specific protein interactions to function, even the wild-type enzyme would fail to support yeast growth, which is clearly not the case. We therefore conclude that the assay robustly reports on the intrinsic enzymatic activity of ASS, while acknowledging that non-essential human-specific interactions may not be assessed. We have updated the manuscript to reflect this point.

      "Importantly, successful functional complementation indicates that ASS enzymatic activity does not depend on any obligate human-specific protein interactions."

      1. The authors note that only a small number (2/11) of mutations at the ASS1 monomer-monomer interface lead to growth defects in yeast. It would be helpful for the authors to discuss this further.

      As discussed in response to the reviewer's comments on the relationship between ASS activity and yeast growth (point 1 above), we expect growth to be a monotonic but nonlinear function of enzymatic activity, with potential ceiling effects at high activity. Under this model, variants causing weak or moderate loss of function may remain indistinguishable from wild type when residual activity is sufficient to support normal growth. We favor this explanation for the observation that only 2/11 interface variants show reduced growth, as many pathogenic interface substitutions are associated with milder disease presentations, consistent with higher residual enzyme function. Consistent with this interpretation, variants affecting the active site, where substitutions are expected to cause large reductions in catalytic activity, are readily detected by the assay.

      Although we cannot exclude partial buffering of dimerization defects in yeast, we interpret the reduced sensitivity to interface variants primarily as a general limitation of growth-based assays. Accordingly, our decision not to use growth >85% as evidence toward benignity is conservative relative to approaches that would classify high-growth variants as benign except at the monomer-monomer interface, avoiding reliance on structural subclassification and minimizing the risk of false benign interpretation. Reduced growth, by contrast, provides strong evidence of loss of ASS1 function and pathogenicity, validated under the OddsPath framework.

      We have updated the Results and Discussion sections to clarify these points (also see response to the reviewer's point 1).

      "A subset of clinically pathogenic ASS1 variants exhibit near-wild-type growth in our yeast assay. In general, we expect a monotonic relationship between ASS function and yeast growth, but with the potential for floor and ceiling effects that constrain the assay's dynamic range. In this context, we interpret high-growth pathogenic variants as likely causing mild loss of function that cannot be distinguished from wild type in our assay. Consistent with this view, many pathogenic variants with high assay growth are located at the monomer-monomer interface rather than the active site, and are associated with milder or later-onset clinical presentations, suggesting partial enzymatic impairment that is clinically relevant in humans but not resolved by the yeast assay."

      "Based on these findings and given that 22/56 pathogenic variants show >85% growth, we conclude that growth above this threshold should not be used as evidence toward benignity. Notably, this approach is conservative relative to treating high-growth variants as benign except at the monomer-monomer interface, avoiding reliance on structural subclassification and minimizing the risk of false benign interpretation arising from assay ceiling effects. Conversely, the variants with

      Reviewer #2 (Significance (Required)):

      This study presents the first comprehensive mutational profiling of human ASS1 and would be of broad interest to clinical geneticists as well as those seeking biochemical insights into the enzymology of ASS1. The authors' use of a yeast system to profile human mutations would be particularly useful for researchers performing deep mutational scans, given that it provides functional insights in a rapid and inexpensive manner.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Section 1 - Evidence, reproducibility, and clarity Summary This manuscript presents a comprehensive functional profiling of 2,193 ASS1 missense variants using a yeast complementation assay, providing valuable data for variant interpretation in the rare disease citrullinemia type I. The dataset is extensive, technically sound, and clinically relevant. The demonstration of intragenic complementation in ASS1 is novel and conceptually important. Overall, the study represents a substantial contribution to functional genomics and rare disease variant interpretation.

      Major comments 1. This is an exciting paper as it can provide support to clinicians to make actionable decisions when diagnosing infants. I have a few major comments, but I want to emphasize the label of "functionally unimpaired" variants to be misleading. The authors explain that there are several pathogenic ClinVar variants that fall into this category (above the >.85 growth threshold) but I think this category needs a more specific name and I would ask the authors to reiterate the shortcomings of the assay again in the Discussion section.

      We thank the reviewer for raising this important point. We agree that the label "functionally unimpaired" could be misleading if interpreted as implying clinical benignity rather than assay behavior. We have therefore clarified that this designation refers strictly to variant behavior in the yeast growth assay and does not imply absence of pathogenicity.

      In addition, we have expanded the Discussion to explicitly address the existence of clinically pathogenic variants with high growth scores (>0.85), emphasizing that these likely reflect a ceiling effect of the assay and represent a key limitation for interpretation. This clarification reiterates that high-growth scores should not be used as evidence toward benignity, while reduced growth provides strong functional evidence of pathogenicity. Relevant revisions are described in our responses to Reviewers 1 and 2.

      1. I think there's an important discussion to be had here, is the assay detecting variants that alter the function of ASS or is it detecting a complete ablation of enzymatic activity? The results might be strengthened with a follow-up experiment that identifies stably expressed ASS1 variants.

      We agree with the review that distinguishing between stability and enzyme activity would be valuable information. Unfortunately, we do not currently have the resources to perform this type of large-scale study. We have acknowledged in the text that our assay does not distinguish between enzyme activity and protein stability:

      "We expect that the assay will detect reductions in both catalytic activity and protein stability, but will not distinguish between these mechanisms."

      At the very least, it would be great to see the authors replicate some of their interesting results from the high-throughput screen by down-selecting to ~12 variants of uncertain significance that could be newly considered pathogenic.

      We have included new analysis of all 25 VUS variants falling in the pathogenic range of our assay (Supplemental Table S7). Reclassification under current guidelines (in the absence of our data) shifts six variants to Pathogenic/Likely Pathogenic and 11 more are reclassified to Likely Pathogenic with the application of our functional data as PS3_Supporting. The remaining eight VUS are all reclassified to Likely Pathogenic when inclusion of homozygous PrimateAI-benign variants allows the assay to satisfy full PS3 criteria.

      1. I would ask the authors to provide more citations of the literature in the introduction of the manuscript. I would be especially interested in knowing more about human ASS being identified as a homolog of yeast ARG1, as they share little sequence similarity (27.5%) at the protein level. That said, I find the yeast complementation assay exciting.

      We thank the reviewer for this suggestion. Human ASS and yeast Arg1 catalyze the same biochemical reaction and share approximately 49% amino acid sequence identity. We have revised the Introduction to clarify this relationship and to note explicitly that the Saccharomyces Genome Database (SGD) identifies the human gene encoding argininosuccinate synthase (ASS1) as the ortholog of yeast ARG1. An appropriate citation has been added to support this statement. The protein alignments have been provided as File S2.

      "This assay is based on the ability of human ASS to functionally replace (complement) its yeast ortholog (Arg1) in S. cerevisiae (Saccharomyces Genome Database, 2026). Importantly, successful functional complementation indicates that ASS enzymatic activity does not depend on any obligate human-specific protein interactions. At the protein level, human ASS and yeast Arg1 display 49% sequence identity (File S2) and share identical enzymatic roles in converting citrulline and aspartate into argininisuccinate."

      1. I appreciate the efforts made by the authors to share their work and make this study more reproducible, such as sharing the hASS1 and yASS1 plasmids being shared on NCBI Genbank (Line 121) and publishing the ONT reads on SRA (Line 154). I made a requests for additional data to be shared, such as the custom method/code for codon optimization and a table of Twist variant cassettes that were ordered. I would also love to see these results shared on MaveDB.org.

      We thank the reviewer for these suggestions regarding data sharing and reproducibility. As requested, we have provided the custom codon optimization script as File S1 and the amino acid alignment used to perform codon harmonization as File S2. The sequence of the underlying variant cassette is included in the corresponding GenBank entry, and we have clarified this point in the legend of Figure 1. For each amino acid substitution, Twist Bioscience used a yeast-specific codon scheme with a single consistent codon per amino acid; accordingly, the sequence of each variant cassette can be inferred from the base construct and the specified amino acid change. A complete list of variant amino acid substitutions used in this study is provided in Table S3.

      1. I find this manuscript very exciting as the authors have a compelling assay that identifies pathogenic variants, but I was generally disappointed by the quality and organization of the figures. For example, Figure 4 provides very little insight, but could be dramatically improved with an overlay of the normalized growth score data or highlighting variants surrounding the substrate or ATP interfaces. There are some very interesting aspects of this manuscript that could be shine through with some polished figures.

      We thank the reviewer for this feedback and agree that clear and well-organized figures are essential for conveying the key results of the study. In response, we have substantially revised Figure 4 by adding colored overlays showing residue conservation and median normalized growth scores (new panels Figure 4C and 4D), which more directly link structural context to functional outcomes and highlight patterns surrounding the active site and substrate interfaces.

      I would also encourage the authors to generate a heatmap of the data represented in Figure 2 (see Fowler and Fields 2014 PMID 25075907, Figure 2), this would be more helpful reference to the readers.

      The reviewer also suggested that a heatmap representation, similar to that used in Fowler and Fields (2014), might aid interpretation of the data shown in Figure 2. Because our dataset consists of sparse single-amino acid substitutions rather than a complete mutational scan, such heatmaps are inherently less dense and less effective at conveying patterns than in saturation mutagenesis studies. Nevertheless, to aid readers who may find this visualization useful, we have generated and included a single-nucleotide variant heatmap as Supplemental Figure S1.

      My major comments are as follows: 6. Citations needed - especially in the introduction and for establishing that hASS is a homolog of yARG1

      We have added the requested citations and clarified the ASS1-ARG1 orthology in the Introduction, as described in our response to point 3 above.

      1. Generally, the authors do a nice job distinguishing the ASS1 gene from the ASS enzyme, though I found some ambiguities (Line 685). Please double-check the use of each throughout the manuscript.

      We have edited the manuscript to ensure consistent and unambiguous use of gene and enzyme nomenclature throughout.

      1. Generally, I'm confused about what strain was used for integrating all these variants, was is the arg1 knock-out strain from the yeast knockout collection or was it FY4? I think FY4 was used for the preliminary experiments, then the KO collection strain was used for making the variant library but I think this could be made more clear in the text and figures. Lines 226-229 describes introducing the hASS1 and yASS1 sequences into the native ARG1 locus in strain FY4, but the Fig1A image depicts the ASS1 variants going into arg1 KO locus. Fig1A should be moved to Fig2.

      We agree that the strain construction steps were not described as clearly as they could have been. We have therefore clarified the strain construction workflow in the Materials & Methods and Results sections, as well as in the Figure 1 legend, to explicitly distinguish preliminary experiments performed in strain FY4 from construction of the variant library in the arg1 knockout background.

      As we have also added an additional panel to Figure 1 that schematically explains how the screen was performed (per Reviewer #2's request), we believe that Figure 1A is appropriately placed and should remain in Figure 1.

      1. Line 303 - "We classify these variants as 'functionally unimpaired'", this is not an accurate description of these variants as Figure 2 highlights 24 pathogenic ClinVar variants that would fall into this category of "functionally unimpaired". The yeast growth assay appears to capture pathogenic variants, but there is likely some nuance of human ASS functionality that is not being assessed here. I would make the language more specific, e.g. "complementary to Arg1" or "growth-compatible".

      We agree that the label "functionally unimpaired" could be misinterpreted if read as implying clinical benignity. We have therefore clarified within the manuscript that this designation refers strictly to variant behavior in the yeast growth assay (i.e., wild-type-like growth under assay conditions) and does not imply absence of pathogenicity. We also expanded the Discussion to explicitly address the subset of clinically pathogenic variants with high growth scores (>0.85), consistent with a ceiling effect of the assay and a key limitation for interpretation. See response to reviewer #3 point 1. Relevant revisions are also discussed in our responses to Reviewers #1 and #2.

      1. Lines 345-355 - It is interesting that there are variants that appear functional at the substrate interfacing sites. Is there anything common across these variants? Are they maintaining the polarity or hydrophobicity of the WT residue? Are any of these variants included in ClinVar or gnomAD? Are pathogenic variants found at any of these sites

      Yes. For highly sensitive active-site residues that have few permissible variants, the vast majority of amino acid substitutions that do retain activity preserve key physicochemical properties of the wild-type residue, such as hydrophobicity or charge. We have added this important observation to the manuscript:

      "Any variants at these sensitive residues that are permissive for activity in our assay retain hydrophobicity or charged states relative to the original amino acid side chain (Figure 5A & Table S5)."

      None of these variants are present in ClinVar. Only L15V and E191D are present in gnomAD (Table S4).

      1. Lines 423-430 - The OddsPath calculation would seem to rely heavily on the thresholds of .85 for normalized growth. The OddsPath calculation could be bolstered with some additional analysis that emphasizes the robustness to alternative thresholds.

      We agree that the sensitivity of the OddsPath calculation to the choice of growth thresholds is an important consideration. In our assay, benign ClinVar variants and non-human primate variants are observed exclusively within the peak centered on wild-type growth, whereas clinically annotated variants falling below this peak are exclusively pathogenic. On this basis, we defined the upper boundary of the assay range interpreted as supporting pathogenicity as the lower boundary of the wild-type-centered peak in the growth distribution (as defined in Figure 3), rather than selecting a cutoff by direct optimization of the OddsPath. This choice reflects the observed concordance, in our dataset, between the onset of measurable functional impairment in the assay and clinical pathogenic annotation. Importantly, in practice the OddsPath value is locally robust to the precise placement of this boundary, remaining invariant across the range 0.82-0.88. Supporting our chosen threshold of 0.85, the lowest-growth benign or primate variant observed has a normalized growth value of 0.88, while the lowest growth observed among variants present as homozygotes in gnomAD was 0.86. We have clarified this rationale and analysis in the revised manuscript.

      "Notably, the "Among all nine of the human ASS1 missense variants observed as homozygotes in gnomAD which were tested as amino acid substitutions in our assay, the lowest observed growth value was 0.86 (Ala258Val) consistent with the lower boundary of the PrimateAI variants which was a growth value of 0.87 (Ala81Thr) (Figure 6) and with our use of a 0.85 classification threshold."

      "If we treat PrimateAI variants as benign (solely for OddsPath calculation purposes), the OddsPath for growth

      1. Lines 432-441 - This is an interesting idea to use variants observed in primates, has ACMG weighed in on this? I understand that CTLN1 is an autosomal recessive disorder but I'd still be interested in seeing how the observed ASS1 missense variants in gnomAD perform in your growth assay, possibly a supplemental figure?

      To our knowledge, the ACMG/AMP guidelines do not currently address the use of homozygous missense variants observed in non-human primates. We are currently in discussion with two ClinGen working groups to discuss the possibility of formalizing the use of this data source.

      We agree that comparison with human population data is also important. Accordingly, total gnomAD allele counts and homozygous counts for all applicable ASS1 missense variants are provided in Table S4, and the growth behavior of ASS1 missense variants observed in the homozygous state in gnomAD is shown in Figure 6. These homozygous variants uniformly exhibit high growth in our assay, consistent with the absence of strong loss-of-function effects. We have updated the manuscript text to clarify these points.

      Minor comments 1. Lines 53-59 - This paragraph needs to cite the literature, especially lines 56, 57, and 59 2. Line 61 - no need to repeat "citrullinemia type I", just use the abbreviation as it was introduced in the paragraph above 3. Lines 61-71 - again, this paragraph needs more literature citations 4. Line 62 - change to "results"

      The changes suggested in points 1-4 have all been implemented in the revised manuscript.

      1. Line 74-75 - "RUSP" acronym not needed as it's never used in the manuscript, the same goes for "HHS"

      We agree that the acronyms "RUSP" and "HHS" are not reused elsewhere in the manuscript. We have nevertheless retained them at first mention, alongside the expanded names, because these acronyms are commonly used in newborn screening and public health policy contexts and may be more familiar to some readers than the expanded terms. We would be happy to remove the acronyms if preferred.

      1. Line 86 - "ASS1" I think is referring to the enzyme and should just be "ASS"? If referring to the gene then italicize to "ASS1"
      2. Lines 91-93 - It would be helpful to mention this is a functional screen in yeast
      3. Line 101 - It would be helpful to the readers to define SD before using the acronym, consider changing to "minimal synthetic defined (SD) medium" and afterwards can refer to as "SD medium"
      4. 109-114 - It would be great if you could share your method for designing the codon-harmonized yASS1 gene, consider sharing as a supplemental script or creating a GitHub repository linked to a Zenodo DOI for publication.

      The changes suggested in points 6-9 have all been implemented in the revised manuscript. The codon harmonization script has been provided as File S1.

      1. Lines 135-137 - I think it's helpful to provide a full table of the cassettes ordered from Twist as well as the primers used to amplify them, consider a supplemental table.

      Details of Twist cassette and the primer sequences used for amplification have been added to the Materials & Methods.

      1. Line 138 - "standard methods" is a bit vague, I'm guessing this is a Geitz and Schiestl 2007 LiAc/ssDNA protocol (PMID 17401334)? Also, was ClonNAT used to select for natMX colonies?

      The reviewer is correct about which protocol was used, and we have added the citation. We have also clarified that selection was carried out based on resistance to nourseothricin.

      1. Line 150 - change to "sequence the entire open reading frame, as previously described [4]."
      2. Line 222-223 - remove "replace" and just use "complement" (and remove the parenthesis)
      3. Line 249 - It would be great to see a supplemental alignment of the hASS1 and yASS1 sequences.
      4. Line 261 - spelling "citrullemia" should be corrected to "citrullinemia"
      5. Line 280 - "using Oxford Nanopore sequencing" is a bit vague, I suggest specifying the equipment used (e.g. Oxford Nanopore Technologies MinION platform) or simplify to "via long-read sequencing (see Materials & Methods)"

      The changes suggested in points 12-16 have all been implemented in the revised manuscript. An alignment of the ASS and Arg1 protein sequences has been provided as File S2.

      1. Line 287-289 - It would be great to see the average number of isolates per variant, as well as a plot of the variant growth estimate vs individual isolate growth.

      We agree with the reviewer that conveying measurement precision is important. The number of isolates assayed per variant is provided in Table S4, and we have added explicit mention of this in the text. Because variants were assayed with a mixture of 1, 2, or {greater than or equal to}3 independent isolates, a scatterplot of variant-level growth estimates versus individual isolate measurements would be difficult to interpret and potentially misleading. Instead, we report standard error estimates for each variant in Table S4, derived from the linear model used to estimate growth effects, which more appropriately summarizes measurement uncertainty given the experimental design.

      1. Lines 324-25 - consider removing the last sentence of this paragraph, it is redundant as the following paragraph starts with the same statement.

      We have removed this sentence.

      1. Lines 327-335 - This is interesting and would benefit from its own subpanel or plot in which the normalized growth score is plotted against variants that are at conserved or diverse residues in human ASS, and see if there's a statistical difference in score between the two groupings.

      As suggested by the reviewer, we have added Supplemental Figure 2 (Figure S2) in which the normalized growth score of each variant is plotted against the conservation of the corresponding residue, as measured by ConSurf. The manuscript already includes a statistical analysis of the relationship between residue conservation and functional impact, showing that amorphic variants occur significantly more frequently at highly conserved residues than unimpaired variants do (one-sided Fisher's exact test). We now refer to this new supplemental figure in the relevant Results section.

      1. Lines 339-341 - As written, it is unclear if aspartate interacts with all of the same residues as citrulline or just Asn123 and Thr119.
      2. Lines 345-355 - As with my above comment, I find this interesting and would
      3. Line 353 - add a period to "al" in "Diez-Fernandex et al."

      The issues raised in points 20 and 22 have all addressed. Point 21 appears to be truncated.

      1. Figure 1 a. Remove "Figure" from the subpanels and show just "A" and "B" (as you do for Figure 4) and combine the two images into a single image. Also make this correction to Figure 5 and Figure 8. b. Panel A - I thought the hASS1 and yASS1 were dropped into FY4, not the arg1 KO strain. This needs clarification. c. Panel A - I'm assuming the natMX cassette contains its own promoter, you could use a right-angled arrow to indicate where the promotors are in your construct. d. Panel B - I'm not sure the bar graph is necessary, it would be more helpful to see calculations of the colony size (or growth curves for each strain) and plot the raw values (maybe pixel counts?) for each replicate rather than normalizing to yeast ARG1. I would be great to have a supplemental figure showing all the replicates side-by-side. e. Panel B - Would be helpful to denote the pathogenic and benign ClinVar variants with an icon or colored text.

      f. Figure 1 Caption - make "A)" and "B)" bold.

      We have implemented the requested changes in Figure 1 with the following exceptions. We have retained panels A and B as separate subfigures because they illustrate distinct experimental concepts. In addition, we respectfully disagree with point (d). The bar graph is intended to provide a clear, high-level comparison of functional complementation by hASS1 versus yASS1 and to illustrate the gross differences in growth between benign and pathogenic proof-of-principle variants. As the bar graph includes error bars for standard deviations, presenting raw colony size measurements or growth curves for individual replicates would substantially complicate the figure without materially improving interpretability for this purpose.

      1. Figure 2 a. "Shown in magenta are amino acid substitutions corresponding to ClinVar pathogenic, pathogenic/likely pathogenic, and likely pathogenic variants" is repeated in the figure caption. b. "Shown in green are amino acid substitutions corresponding to ClinVar benign and likely benign variants." I don't see any green points. c. Identify the colors used for ASS1 substrate binding residues. d. This plot would benefit from a depiction of the human ASS secondary structure and any protein domains (nucleotide-binding domain, synthase domain, and C-terminal helix from Fig4B)

      e. Line 685 675 - "ASS1" is being used in reference to the enzyme, is this correct or should it be "ASS"?

      We have made the requested changes to Figure 2. The repeated caption text has been removed, and references to green points have been corrected to orange points to match the figure. The colors used to indicate ASS substrate-binding residues are explicitly described in the figure key. Secondary structure annotations have been added. References to the enzyme have been corrected to "ASS" rather than "ASS1" where appropriate.

      1. Figure 3 a. Rename the "unimpaired" category as there are several pathogenic ClinVar variants that fall into this category.

      To address this point, we have clarified the labeling by adding "in our yeast assay" to the figure legend, making explicit that the "unimpaired" category refers only to wild-type-like behavior under assay conditions and does not imply clinical benignity. See also response to Reviewer #3, Major Comment 1.

      1. Figure 4 a. List the PDB or AlphaFold accession used for this structure b. Panel A - state which colors are used for to depict each monomer. It is confusing to see several shades of pink/purple used to depict a single monomer in Panel A. c. It is very difficult to make out the aspartate and citrulline substrates in the catalytic binding activity, consider making an inset zooming-in on this domain and displaying a ribbon diagram of the structure rather than the surface. d. Generally, it would be more helpful here to label any particular residues that were identified as pathogenic from your screen, or to overlay average grow scores per residue data onto the structure

      We have implemented the requested changes to Figure 4. The relevant PDB/AlphaFold accession is now listed, and the colors used to depict each monomer in Panel A are clarified in the figure legend. An inset focusing on the active site has been added to improve visualization of the citrulline and aspartate substrates. In addition, we have added new panels (Figure 4C and 4D) overlaying pathogenic residues and average growth scores onto the structure to more directly link structural context with functional data.

      1. Figure 5 a. Line 716 - Insert a page break to place Figure 5 on its own page b. I suggest using a heatmap for this type of plot, as it is very difficult to track which color corresponds to which residue.

      c. Fig5A - This plot could be improved by identifying which residue positions interface with which substrate.

      We have placed Figure 5 on its own page and added information to the legend identifying which residue positions interface with each substrate. We have retained the active-site variant strip charts raised in point (b), as we believe they effectively illustrate how the distribution of variant effects differs between residues. In addition, we have provided a supplemental heatmap showing variant growth across the entire protein (Figure S1), and individual variant scores for all residues are provided in Table S4.

      1. Figure 7 a. Line 735 - Insert page break to place figure on a new page

      List the PDB accession used for these images. c. For clarity I would mention "human ASS" in the figure title d. State the colors of the substrates e. Panels A and B could be combined into a single panel, making it easier to distinguish the active site and dimerization variants.

      f. Could be interesting to get SASA scores for the ClinVar structural variants to determine if they are surface-accessible

      We have implemented the requested changes in Figure 7 with the following exceptions. For point (e), there is no single orientation of the structure that allows a clear simultaneous view of both active-site and dimerization variants; accordingly, we have retained panels A and B as separate subfigures to preserve clarity. With respect to point (f), we agree that solvent accessibility analysis could be informative in other contexts. However, such an analysis does not integrate naturally with the functional and assay-based framework of the present study and was therefore not included.

      1. Figure 8 a. Panel B - overlay a square frame in the larger protein structure that depicts where the below inset is focused, and frame inset image as well.

      We have framed the inset image as requested. We did not add a corresponding frame to the full protein structure, as doing so obscured structural details in the region of interest.

      Reviewer #3 (Significance (Required)):

      Section 2 - Significance This study represents a substantial technical, functional, and translational advance in the interpretation of missense variation in ASS1, a gene of high clinical relevance for the rare disease citrullinemia type I. Its principal strength lies in the generation of an experimentally validated functional atlas of ASS1 missense variants that covers ~90% of all SNV-accessible substitutions. The scale, internal reproducibility, and careful benchmarking of the yeast complementation assay against known pathogenic and benign variants provide a robust foundation for identifying pathogenic ASS1 variants. Particularly strong aspects include the rigorous quality control of variant identities, the quantitative nature of the functional readout, and the thoughtful integration of results into the ACMG/AMP OddsPath framework. The discovery of intragenic complementation between variants affecting distinct structural regions of the enzyme is a notable conceptual and mechanistic contribution. Limitations include the assay's reduced sensitivity to variants impacting oligomerization or subtle folding defects, and the use of yeast as a heterologous system, which may mask disease-relevant mechanisms as several pathogenic ClinVar variants were found to be "functionally unimpaired". Future work extending functional testing to additional cellular contexts or expanding genotype-level combinatorial analyses would further enhance clinical applicability. Relative to prior studies, which have relied on small numbers of patient-derived variants or low-throughput biochemical assays, this work extends the field decisively by delivering a comprehensive, variant-resolved functional map for ASS1. To the best of my current knowledge, this is the first systematic functional screen of ASS1 at this scale and the first direct experimental demonstration that ASS active sites span multiple subunits, enabling intragenic complementation consistent with Crick and Orgel's classic variant sequestration model. As such, the advance is simultaneously technical (high-throughput functional genomics), mechanistic (defining structural contributors to catalysis and epistasis), and clinical (enabling evidence-based reclassification of VUS). I find the use of homozygous non-human primate variants as an orthogonal benign calibration set both creative and controversial, my hope would be that this manuscript will prompt a productive discussion.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary

      This manuscript presents a comprehensive functional profiling of 2,193 ASS1 missense variants using a yeast complementation assay, providing valuable data for variant interpretation in the rare disease citrullinemia type I. The dataset is extensive, technically sound, and clinically relevant. The demonstration of intragenic complementation in ASS1 is novel and conceptually important. Overall, the study represents a substantial contribution to functional genomics and rare disease variant interpretation.

      Major comments

      This is an exciting paper as it can provide support to clinicians to make actionable decisions when diagnosing infants. I have a few major comments, but I want to emphasize the label of "functionally unimpaired" variants to be misleading. The authors explain that there are several pathogenic ClinVar variants that fall into this category (above the >.85 growth threshold) but I think this category needs a more specific name and I would ask the authors to reiterate the shortcomings of the assay again in the Discussion section. I think there's an important discussion to be had here, is the assay detecting variants that alter the function of ASS or is it detecting a complete ablation of enzymatic activity? The results might be strengthened with a follow-up experiment that identifies stably expressed ASS1 variants. At the very least, it would be great to see the authors replicate some of their interesting results from the high-throughput screen by down-selecting to ~12 variants of uncertain significance that could be newly considered pathogenic. I would ask the authors to provide more citations of the literature in the introduction of the manuscript. I would be especially interested in knowing more about human ASS being identified as a homolog of yeast ARG1, as they share little sequence similarity (27.5%) at the protein level. That said, I find the yeast complementation assay exciting. I appreciate the efforts made by the authors to share their work and make this study more reproducible, such as sharing the hASS1 and yASS1 plasmids being shared on NCBI Genbank (Line 121) and publishing the ONT reads on SRA (Line 154). I made a requests for additional data to be shared, such as the custom method/code for codon optimization and a table of Twist variant cassettes that were ordered. I would also love to see these results shared on MaveDB.org. I find this manuscript very exciting as the authors have a compelling assay that identifies pathogenic variants, but I was generally disappointed by the quality and organization of the figures. For example, Figure 4 provides very little insight, but could be dramatically improved with an overlay of the normalized growth score data or highlighting variants surrounding the substrate or ATP interfaces. There are some very interesting aspects of this manuscript that could be shine through with some polished figures. I would also encourage the authors to generate a heatmap of the data represented in Figure 2 (see Fowler and Fields 2014 PMID 25075907, Figure 2), this would be more helpful reference to the readers.

      My major comments are as follows:

      1. Citations needed - especially in the introduction and for establishing that hASS is a homolog of yARG1
      2. Generally, the authors do a nice job distinguishing the ASS1 gene from the ASS enzyme, though I found some ambiguities (Line 685). Please double-check the use of each throughout the manuscript
      3. Generally, I'm confused about what strain was used for integrating all these variants, was is the arg1 knock-out strain from the yeast knockout collection or was it FY4? I think FY4 was used for the preliminary experiments, then the KO collection strain was used for making the variant library but I think this could be made more clear in the text and figures. Lines 226-229 describes introducing the hASS1 and yASS1 sequences into the native ARG1 locus in strain FY4, but the Fig1A image depicts the ASS1 variants going into arg1 KO locus. Fig1A should be moved to Fig2.
      4. Line 303 - "We classify these variants as 'functionally unimpaired'", this is not an accurate description of these variants as Figure 2 highlights 24 pathogenic ClinVar variants that would fall into this category of "functionally unimpaired". The yeast growth assay appears to capture pathogenic variants, but there is likely some nuance of human ASS functionality that is not being assessed here. I would make the language more specific, e.g. "complementary to Arg1" or "growth-compatible".
      5. Lines 345-355 - It is interesting that there are variants that appear functional at the substrate interfacing sites. Is there anything common across these variants? Are they maintaining the polarity or hydrophobicity of the WT residue? Are any of these variants included in ClinVar or gnomAD? Are pathogenic variants found at any of these sites
      6. Lines 423-430 - The OddsPath calculation would seem to rely heavily on the thresholds of <.05 and >.85 for normalized growth. The OddsPath calculation could be bolstered with some additional analysis that emphasizes the robustness to alternative thresholds.
      7. Lines 432-441 - This is an interesting idea to use variants observed in primates, has ACMG weighed in on this? I understand that CTLN1 is an autosomal recessive disorder but I'd still be interested in seeing how the observed ASS1 missense variants in gnomAD perform in your growth assay, possibly a supplemental figure?

      Minor comments

      1. Lines 53-59 - This paragraph needs to cite the literature, especially lines 56, 57, and 59
      2. Line 61 - no need to repeat "citrullinemia type I", just use the abbreviation as it was introduced in the paragraph above
      3. Lines 61-71 - again, this paragraph needs more literature citations
      4. Line 62 - change to "results"
      5. Line 74-75 - "RUSP" acronym not needed as it's never used in the manuscript, the same goes for "HHS"
      6. Line 86 - "ASS1" I think is referring to the enzyme and should just be "ASS"? If referring to the gene then italicize to "ASS1"
      7. Lines 91-93 - It would be helpful to mention this is a functional screen in yeast
      8. Line 101 - It would be helpful to the readers to define SD before using the acronym, consider changing to "minimal synthetic defined (SD) medium" and afterwards can refer to as "SD medium"
      9. 109-114 - It would be great if you could share your method for designing the codon-harmonized yASS1 gene, consider sharing as a supplemental script or creating a GitHub repository linked to a Zenodo DOI for publication.
      10. Lines 135-137 - I think it's helpful to provide a full table of the cassettes ordered from Twist as well as the primers used to amplify them, consider a supplemental table
      11. Line 138 - "standard methods" is a bit vague, I'm guessing this is a Geitz and Schiestl 2007 LiAc/ssDNA protocol (PMID 17401334)? Also, was ClonNAT used to select for natMX colonies?
      12. Line 150 - change to "sequence the entire open reading frame, as previously described [4]."
      13. Line 222-223 - remove "replace" and just use "complement" (and remove the parenthesis)
      14. Line 249 - It would be great to see a supplemental alignment of the hASS1 and yASS1 sequences
      15. Line 261 - spelling "citrullemia" should be corrected to "citrullinemia"
      16. Line 280 - "using Oxford Nanopore sequencing" is a bit vague, I suggest specifying the equipment used (e.g. Oxford Nanopore Technologies MinION platform) or simplify to "via long-read sequencing (see Materials & Methods)"
      17. Line 287-289 - It would be great to see the average number of isolates per variant, as well as a plot of the variant growth estimate vs individual isolate growth
      18. Lines 324-25 - consider removing the last sentence of this paragraph, it is redundant as the following paragraph starts with the same statement
      19. Lines 327-335 - This is interesting and would benefit from its own subpanel or plot in which the normalized growth score is plotted against variants that are at conserved or diverse residues in human ASS, and see if there's a statistical difference in score between the two groupings
      20. Lines 339-341 - As written, it is unclear if aspartate interacts with all of the same residues as citrulline or just Asn123 and Thr119.
      21. Lines 345-355 - As with my above comment, I find this interesting and would
      22. Line 353 - add a period to "al" in "Diez-Fernandex et al."
      23. Figure 1

      a. Remove "Figure" from the subpanels and show just "A" and "B" (as you do for Figure 4) and combine the two images into a single image. Also make this correction to Figure 5 and Figure 8

      b. Panel A - I thought the hASS1 and yASS1 were dropped into FY4, not the arg1 KO strain. This needs clarification

      c. Panel A - I'm assuming the natMX cassette contains its own promoter, you could use a right-angled arrow to indicate where the promotors are in your construct

      d. Panel B - I'm not sure the bar graph is necessary, it would be more helpful to see calculations of the colony size (or growth curves for each strain) and plot the raw values (maybe pixel counts?) for each replicate rather than normalizing to yeast ARG1. I would be great to have a supplemental figure showing all the replicates side-by-side

      e. Panel B - Would be helpful to denote the pathogenic and benign ClinVar variants with an icon or colored text

      f. Figure 1 Caption - make "A)" and "B)" bold 24. Figure 2

      a. "Shown in magenta are amino acid substitutions corresponding to ClinVar pathogenic, pathogenic/likely pathogenic, and likely pathogenic variants" is repeated in the figure caption

      b. "Shown in green are amino acid substitutions corresponding to ClinVar benign and likely benign variants." I don't see any green points

      c. Identify the colors used for ASS1 substrate binding residues

      d. This plot would benefit from a depiction of the human ASS secondary structure and any protein domains (nucleotide-binding domain, synthase domain, and C-terminal helix from Fig4B)

      e. Line 685 - "ASS1" is being used in reference to the enzyme, is this correct or should it be "ASS"? 25. Figure 3

      a. Rename the "unimpaired" category as there are several pathogenic ClinVar variants that fall into this category 26. Figure 4

      a. List the PDB or AlphaFold accession used for this structure

      b. Panel A - state which colors are used for to depict each monomer. It is confusing to see several shades of pink/purple used to depict a single monomer in Panel A

      c. It is very difficult to make out the aspartate and citrulline substrates in the catalytic binding activity, consider making an inset zooming-in on this domain and displaying a ribbon diagram of the structure rather than the surface.

      d. Generally, it would be more helpful here to label any particular residues that were identified as pathogenic from your screen, or to overlay average grow scores per residue data onto the structure 27. Figure 5

      a. Line 716 - Insert a page break to place Figure 5 on its own page

      b. I suggest using a heatmap for this type of plot, as it is very difficult to track which color corresponds to which residue

      c. Fig5A - This plot could be improved by identifying which residue positions interface with which substrate 28. Figure 7

      a. Line 735 - Insert page break to place figure on a new page

      b. List the PDB accession used for these images

      c. For clarity I would mention "human ASS" in the figure title

      d. State the colors of the substrates

      e. Panels A and B could be combined into a single panel, making it easier to distinguish the active site and dimerization variants

      f. Could be interesting to get SASA scores for the ClinVar structural variants to determine if they are surface-accessible 29. Figure 8

      a. Panel B - overlay a square frame in the larger protein structure that depicts where the below inset is focused, and frame inset image as well.

      Significance

      This study represents a substantial technical, functional, and translational advance in the interpretation of missense variation in ASS1, a gene of high clinical relevance for the rare disease citrullinemia type I. Its principal strength lies in the generation of an experimentally validated functional atlas of ASS1 missense variants that covers ~90% of all SNV-accessible substitutions. The scale, internal reproducibility, and careful benchmarking of the yeast complementation assay against known pathogenic and benign variants provide a robust foundation for identifying pathogenic ASS1 variants. Particularly strong aspects include the rigorous quality control of variant identities, the quantitative nature of the functional readout, and the thoughtful integration of results into the ACMG/AMP OddsPath framework. The discovery of intragenic complementation between variants affecting distinct structural regions of the enzyme is a notable conceptual and mechanistic contribution. Limitations include the assay's reduced sensitivity to variants impacting oligomerization or subtle folding defects, and the use of yeast as a heterologous system, which may mask disease-relevant mechanisms as several pathogenic ClinVar variants were found to be "functionally unimpaired". Future work extending functional testing to additional cellular contexts or expanding genotype-level combinatorial analyses would further enhance clinical applicability.

      Relative to prior studies, which have relied on small numbers of patient-derived variants or low-throughput biochemical assays, this work extends the field decisively by delivering a comprehensive, variant-resolved functional map for ASS1. To the best of my current knowledge, this is the first systematic functional screen of ASS1 at this scale and the first direct experimental demonstration that ASS active sites span multiple subunits, enabling intragenic complementation consistent with Crick and Orgel's classic variant sequestration model. As such, the advance is simultaneously technical (high-throughput functional genomics), mechanistic (defining structural contributors to catalysis and epistasis), and clinical (enabling evidence-based reclassification of VUS). I find the use of homozygous non-human primate variants as an orthogonal benign calibration set both creative and controversial, my hope would be that this manuscript will prompt a productive discussion.

    1. Briefing : L'émancipation de l'Éducation nationale face au monopole de Microsoft

      Ce document synthétise les enjeux de la dépendance technologique de l'Éducation nationale française envers Microsoft et l'émergence d'une alternative structurée autour du logiciel libre et de la collaboration enseignante.

      Résumé Exécutif

      L'Éducation nationale française fait face à une dépendance coûteuse et structurelle vis-à-vis des solutions propriétaires, principalement Microsoft.

      Le passage imposé de Windows 10 à Windows 11 illustre cette vulnérabilité : l'obsolescence logicielle pourrait coûter jusqu'à un milliard d'euros à l'échelle nationale pour le renouvellement du parc informatique.

      Face à ce constat, une "guérilla" de l'open source s'organise. Portée par la Direction du numérique pour l'éducation (DNE) et des initiatives comme « La Forge », cette dynamique mobilise désormais 10 000 enseignants-développeurs.

      L'objectif est de substituer aux licences onéreuses des « communs numériques » (Linux, BigBlueButton, NextCloud), garantissant la souveraineté des données, la pérennité des investissements publics et une pédagogie adaptée aux besoins réels du terrain.

      --------------------------------------------------------------------------------

      1. Le constat d'une dépendance critique : Le "cas d'école" Microsoft

      La relation entre l'institution scolaire et Microsoft est décrite comme une forme d'addiction budgétaire et technique.

      Le coût de l'obsolescence imposée

      L'exemple des Hauts-de-France : Suite à une cyberattaque par ransomware, la région a dû envisager la migration vers Windows 11.

      Un membre de la DSI a estimé à 100 millions d'euros le coût pour renouveler 30 000 PC incapables de supporter cette mise à jour.

      Extrapolation nationale : Les Hauts-de-France représentant environ 10 % de l'éducation nationale, le coût total pour la mise à jour forcée du parc (300 000 machines) est estimé à 1 milliard d'euros.

      La vente liée : Le monopole s'appuie sur le mécanisme de la vente liée, où le système d'exploitation est pré-installé sans distinction de prix entre le matériel et le logiciel, imposant une solution "clé en main" qui freine l'adoption d'alternatives.

      Limites des services propriétaires

      Coûts récurrents : Des dizaines de millions d'euros sont versés chaque année en licences.

      Failles systémiques : La crise du Covid-19 a révélé les carences du système numérique éducatif, notamment sa dépendance à des solutions propriétaires onéreuses et son manque de cohérence globale.

      --------------------------------------------------------------------------------

      2. La stratégie de rupture par le Logiciel Libre

      Face au monopole, des solutions basées sur Linux et l'open source prouvent leur viabilité sur le terrain.

      Distributions Linux dédiées à l'éducation

      Il existe des alternatives robustes permettant d'adapter l'ordinateur aux besoins pédagogiques :

      PrimTux : Système d'exploitation spécifique pour les écoles primaires.

      ND (Numérique Inclusif, Responsable et Durable) : Distribution destinée au secondaire.

      Obstacles et leviers d'adoption

      | Obstacle | État des lieux | Perspectives | | --- | --- | --- | | Logiciels métiers | Certains éditeurs (SVT, physique, techno) ne développent que pour Windows. | Pression par la masse : l'augmentation du parc Linux doit forcer les éditeurs à s'adapter. | | Logiciels de vie scolaire | Pronote dispose d'un client Windows complet mais d'une version web dégradée sous Linux. | Nécessité d'une évolution des clients vers des standards interopérables. | | Résilience | En cas d'attaque (ransomware), les systèmes sous Windows ont été paralysés. | Des lycées sous Linux (ex: Lycée Carnot à Bruay-la-Buissière) ont pu proposer leur aide et leurs outils. |

      --------------------------------------------------------------------------------

      3. « La Forge » : L'industrialisation de l'innovation enseignante

      « La Forge » représente un changement de paradigme : passer de l'enseignant "bricoleur" isolé à une communauté structurée de développeurs au sein de l'État.

      Un modèle collaboratif massif

      Effectifs : 10 000 enseignants inscrits.

      Volume : 6 500 projets (dépôts de code) enregistrés.

      Fonctionnement : Outil de travail collaboratif (basé sur le modèle GitHub) permettant de fédérer, tester et partager des codes sources et des ressources pédagogiques.

      Exemples de projets emblématiques

      MindStory : Alternative open source à Minecraft, permettant aux élèves de collaborer sur des constructions sans dépendre d'un compte Microsoft payant.

      Philo GPT : Interface permettant de dialoguer avec des simulations de grands philosophes.

      Execubot : Outil d'apprentissage de la programmation via un robot virtuel.

      Créa-appli : Outil utilisant l'IA pour aider les profs à générer des prototypes d'applications (HTML/JS) via le "vibe coding" (codage par prompt).

      --------------------------------------------------------------------------------

      4. Souveraineté, Communs Numériques et Commande Publique

      L'enjeu n'est pas seulement technique, il est politique et financier : assurer que l'argent public finance des biens publics.

      La notion de "Communs Numériques"

      Un commun numérique repose sur trois piliers : une ressource, une communauté et une gouvernance. L'idée est que l'amélioration d'un logiciel par le ministère bénéficie à tous.

      Les services souverains déjà déployés

      Le ministère opère et héberge ses propres instances de logiciels libres pour s'affranchir des GAFAM :

      BigBlueButton : Alternative à Zoom/Meet pour la visioconférence (participation financière du ministère au développement du projet global).

      Apps.education.fr : Portail regroupant des outils comme Tube (alternative à YouTube basée sur PeerTube) ou NextCloud (alternative à Google Drive).

      Critique du modèle traditionnel de commande publique

      Par le passé, l'État stimulait les start-ups ("EdTech") via des marchés publics sans exiger la propriété intellectuelle :

      1. Les entreprises conservaient le code source et les données.

      2. L'État devait payer des abonnements pour continuer à utiliser ce qu'il avait financé.

      3. Résultat : Aucune capitalisation sur le long terme.

      La nouvelle approche privilégie la pérennité : Investir dans l'open source permet à l'institution de conserver la maîtrise de ses outils, même après la fin d'un contrat avec un prestataire.

      --------------------------------------------------------------------------------

      Citations Clés

      « L'éducation nationale est accro à Microsoft. Chaque année, des dizaines de millions d'euros s'envolent en licences. »

      « Le slogan de la forge c'est : "L'union fait la forge". »

      « On a oublié que nos profs étaient aussi capables de fabriquer leurs propres ressources... On a passé des marchés avec ces EdTech où on n'exigeait rien en termes de propriété intellectuelle. Les boîtes repartaient avec l'ensemble du code. »

      « Un milliard pour faire une mise à jour de système d'exploitation qui était imposée par Microsoft parce que Microsoft a déclaré qu'ils arrêtent le support de Windows 10. »

    1. AbstractBackground Large language models (LLMs) have significantly advanced natural language processing in biomedical research, however, their reliance on implicit, statistical representations often results in factual inaccuracies or hallucinations, posing significant concerns in high-stakes biomedical contexts.Results To overcome these limitations, we developed BTE-RAG, a retrieval-augmented generation framework that integrates the reasoning capabilities of advanced language models with explicit mechanistic evidence sourced from BioThings Explorer, an API federation of more than sixty authoritative biomedical knowledge sources. We systematically evaluated BTE-RAG in comparison to traditional LLM-only methods across three benchmark datasets that we created from DrugMechDB. These datasets specifically targeted gene-centric mechanisms (798 questions), metabolite effects (201 questions), and drug–biological process relationships (842 questions). On the gene-centric task, BTE-RAG increased accuracy from 51% to 75.8% for GPT-4o mini and from 69.8% to 78.6% for GPT-4o. In metabolite-focused questions, the proportion of responses with cosine similarity scores of at least 0.90 rose by 82% for GPT-4o mini and 77% for GPT-4o. While overall accuracy was consistent in the drug–biological process benchmark, the retrieval method enhanced response concordance, producing a greater than 10% increase in high-agreement answers (from 129 to 144) using GPT-4o.Conclusion Federated knowledge retrieval provides transparent improvements in accuracy for large language models, establishing BTE-RAG as a valuable and practical tool for mechanistic exploration and translational biomedical research.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giag007), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 1: Christopher Tabone

      Dear Authors,

      Thank you for the opportunity to review "Federated Knowledge Retrieval Elevates Large Language Model Performance on Biomedical Benchmarks." The paper tackles a timely and important problem: grounding large language models in mechanistic evidence to reduce unsupported claims. It does so with a thoughtful design that layers BTE-RAG over a federation of approximately 60 biomedical APIs and evaluates three complementary DrugMechDB-derived benchmarks (gene, metabolite, drug to process). The manuscript is clearly written, the technical contribution is meaningful, and the experimental results are promising.

      Recommendation: Major revision.

      Below are concrete, actionable changes that would bring the work in line with GigaScience's standards for FAIR availability, licensing, documentation, testing, and reproducibility. Many are straightforward, but together they matter for long-term reuse and auditability.

      1) Statistical rigor: paired inference, uncertainty, variance The manuscript reports compelling descriptive gains. Because each benchmark item is answered under both conditions (LLM-only and BTE-RAG), the study is a paired design. In paired settings, descriptive plots and point estimates are not sufficient to establish that improvements exceed sampling noise or threshold tuning. Please add paired statistical evidence that quantifies: (i) whether the gains are reliable, (ii) how large they are in practical terms, and (iii) how stable they are under repeated runs or under a fully deterministic pipeline. Gene task (binary): Report McNemar's test on the existing 2×2 tables, along with 95 percent Wilson confidence intervals for each condition and a Newcombe confidence interval for the accuracy difference. Keep the flip counts in the text.

      Metabolite and drug-to-process tasks (similarity): Report paired bootstrap confidence intervals or Wilcoxon signed-rank tests on per-item similarity differences (BTE-RAG minus baseline). Include a nonparametric effect size such as Cliff's delta with its confidence interval.

      Threshold validation: Treat the greater-than-or-equal-to 0.90 "high-fidelity" threshold as a choice that should be validated. Show sensitivity across nearby cutoffs such as 0.85, 0.90, and 0.95, and add a small blinded expert adjudication (about 50 to 100 items) to confirm that the high-cosine band corresponds to acceptable correctness.

      Variance or determinism: Either document end-to-end determinism (frozen retrieval caches, fixed ordering, pinned embeddings) or run at least three replicates and report mean and standard deviation.

      These additions convert the current descriptive story into paired inference with uncertainty and effect sizes and clarify robustness around thresholding and reproducibility.

      2) Benchmark scope and generalizability All three evaluations are derived from DrugMechDB, which makes the study internally consistent but also couples the tasks to a single curation philosophy and evidence distribution. Please acknowledge this limitation explicitly in the Discussion and, ideally, add an external validation on at least one independent source to demonstrate generalizability. Options include CTD (drug-gene-process links), Reactome or GO (pathway and process grounding), DisGeNET (gene-disease associations), or a lightweight question answering set sourced outside DrugMechDB. Even a modest external set of about 100 to 200 items, evaluated with the same paired protocols and identifier-based scoring, would strengthen the claim. If full external validation is not feasible for this revision, please include robustness checks such as a date-based split, entity-family holdouts, and per-source ablations.

      3) Licensing, attribution, and persistent identifiers The project is MIT-licensed and adapts components from BaranziniLab/KG_RAG (Apache-2.0) and SuLab/DrugMechDB (CC0-1.0). To meet license obligations and align with FAIR and the Joint Declaration of Data Citation Principles, please: (i) keep Apache-licensed code under Apache with the upstream LICENSE and NOTICE files, noting any modifications; (ii) include the CC0 dedication text for any DrugMechDB artifacts and note that CC0 provides no patent grant; (iii) archive with DOIs (GigaDB preferred?) the three benchmarks, the exact evaluation caches used in the paper, and a tagged software release of the repository; (iv) license datasets under CC0 or CC BY while keeping the code MIT; (v) add a short Data and Software Availability table listing artifact, DOI or URL, license, and version or date.

      4) Error analysis and degradation cases Please add a brief failure analysis focused on where BTE-RAG reduces accuracy relative to LLM-only. At minimum, report the total number and percent of right-to-wrong flips per task and include a small set of representative cases. For each example, show the input, expected and predicted outputs, the top retrieved evidence with identifiers and timestamps, and a one-line diagnosis of the likely cause (for example normalization mismatch, retrieval coverage gap, ranking or filtering that hid relevant context, or long-context truncation). A short summary that groups the main causes into two or three buckets will make the results more interpretable and point to practical fixes.

      5) Methodological transparency: embedding and scoring models Please add two or three sentences in Methods explaining why S-PubMedBERT-MS-MARCO is used for filtering retrieved context while a BioBERT-based model is used for semantic similarity scoring, and what advantages each provides over plausible alternatives. A brief rationale will strengthen methodological transparency.

      6) Reproducibility workflow and archived caches Because BTE federates live APIs, results can drift as sources update. Please archive the exact retrieval caches used in evaluation with DOIs and minimal provenance if at all possible (query identifier, subject and object identifiers, predicate, source name and version or access date, any confidence score, and a retrieval timestamp).

      In summary, this is a promising and well-motivated study that could make a useful contribution once the statistical evidence, FAIR availability, and reproducibility pieces are tightened as outlined above. I recommend Major Revision and am happy to re-review a revised version.

    1. AbstractAdvances in spatial omics enable measurement of genes (spatial transcriptomics) and peptides, lipids, or N-glycans (mass spectrometry imaging) across thousands of locations within a tissue. While detecting spatially variable molecules is a well-studied problem, robust methods for identifying spatially varying co-expression between molecule pairs remain limited. We introduce SpaceBF, a Bayesian fused modeling framework that estimates co-expression at both local (location-specific) and global (tissue-wide) levels. SpaceBF enforces spatial smoothness via a fused horseshoe prior on the edges of a predefined spatial adjacency graph, allowing large, edge-specific differences to escape shrinkage while preserving overall structure. In extensive simulations, SpaceBF achieves higher specificity and power than commonly used methods that leverage geospatial metrics, including bivariate Moran’s I and Lee’s L. We also benchmark the proposed prior against standard alternatives, such as intrinsic conditional autoregressive (ICAR) and Matérn priors. Applied to spatial transcriptomics and proteomics datasets, SpaceBF reveals cancer-relevant molecular interactions and patterns of cell–cell communication (e.g., ligand–receptor signaling), demonstrating its utility for principled, uncertainty-aware co-expression analysis of spatial omics data.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giag006), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 2: Daniel Domovic

      Dear authors,

      I read your manuscript "SpaceBF: Spatial coexpression analysis using Bayesian Fused approaches in spatial omics datasets" with interest.

      The manuscript presents SpaceBF, a Bayesian method for detecting spatial co-expression between pairs of molecules in spatial omics data. The topic is relevant since new technologies like spatial transcriptomics, mass spectrometry imaging, and multiplex immunofluorescence produce large data but current tools for co-expression are limited. The authors try to solve this gap with a new model and they also test it on real datasets. The paper is technical, but it also gives biological examples, which is helpful for readers.

      The paper has many strong points. First, the idea to use Bayesian fused horseshoe prior together with MST spatial structure is new and well explained. Second, the authors apply their method on three real datasets and they show interesting biology, for example IGF2-IGF1R relation, keratin isoform consistency, and stromal ECM peptides. Third, I appreciate that the code is open on GitHub. Also, the paper compares with other methods and deals with the common problem of variance-stabilizing transform by modeling UMI counts directly with negative binomial distribution.

      Overall, the work is clear and well organized, but there are some points where more explanation or clarification would help. In my review I give major and minor remarks that I hope will improve the paper.

      Major remarks 1. Were you worried choosing MST may oversimplify spatial relationships, since many meaningful local neighborhoods may be excluded? Would the results of SpaceBF be significantly different if a different spatial graph, such as kNN, Delaunay triangulation, or kernel-based, was used instead of MST? 2. Since MST edges depend a lot on pairwise L2 distances, how stable are the results if spatial coordinates are a little noisy, or if there are tissue registration errors? 3. The model puts one molecule as outcome and the other as predictor. Are the co-expression estimates still the same if you switch roles? 4. In the Results you mention "FDR < 0.1." Can you explain which method you used for FDR? Also, are the discoveries robust if you change the threshold (for example 0.05 vs 0.1)? 5. Do the simulation parameters (lengthscale, slope, dispersion) correspond to realistic biological signal strengths and spatial scales observed in real datasets? Three values of the lengthscale l are considered, l = 3.6, 7.2, 18. Why exactly these values? What does ν=0.75 mean in terms of effect size? How does l=18 compare to real tissue lengthscales? 6. Can you describe runtime and memory for larger datasets, like 10X Visium with 5,000-20,000 spots? Is the current MCMC practical for this scale, or do you think approximate inference (like variational Bayes or INLA) is needed?

      Minor remark 1. How sensitive are the results to the choice of hyperparameters for the Horseshoe prior? 2. In the Results you state that keratins "co-express highly, meaning their binding patterns with any specific type 1 keratin should be similar." Please make clear that SpaceBF measures co-expression, not direct binding, so that conclusions are not overstated. 3. You mention SpatialCorr and Copulacci, but the comparison was not successful. Even if parameters were sensitive, I think one short numerical comparison in the supplement would be helpful. 4. You filter out genes with fewer than ~59 total reads (0.2 x number of spots). Can you justify the choice of this threshold and show if results are stable for other thresholds (for example 0.1x or 0.5x)? Since many ligands and receptors are lowly expressed, is there a risk of losing meaningful biology? Since the dataset has only 293 spots, thresholds can have strong effect.

    1. The security and metadata tooling built on top of these registries tends to be US-based regardless of where the registry itself is hosted. A European company running Forgejo for code hosting still typically uses US services for dependency updates, vulnerability scanning, license compliance, and SBOM generation. Self-hosting the forge doesn’t change the intelligence layer.

      common situation. You stack tools, and may only have one of them in the EU or selfhosted

    1. 2.3.5. Compilers and Programimng Languages# History# In the early 1950s, Grace Hopper proposed a better way of programming a computer. She suggested creating a “programming language” based on English words with a “compiler” computer program that would turn the computer language code into binary computer instructions. photo of Grace Hopper c. 1960, at that time a Commander in the US Navy. When Hopper’s ideas were mostly ignored, she proceeded to create her own compiler and later helped design some of the most important and influential early programming languages and compilers. The new set-up for programming# So, thanks to Grace Hopper, we now have a new set-up for computer programming, which is what programmers still use today: When someone wants a computer to perform a task (that hasn’t already been programmed), a human programmer will act as a translator to translate that task into a programming language. Next, a compiler (or interpreter) program will translate the programming language code into the binary code that the computer runs. In this set-up, the programming language acts as an intermediate language the way that French did in my earlier analogy. In this set-up, a programmers basic task is to do these three things: Given a problem, break it down into steps for a computer Write those steps down in a programming language Run the compiler or interpreter, so the computer program can run on the computer Programming languages# Programming languages (e.g., Python, R, Java) are specially designed languages that attempt to split the difference between how a computer thinks and communicates and how people think and communicate. There are many programming languages, with different specializations and trade-offs. In this book, we will use Python, which is commonly used in data science tasks, and has support for writing programs that work with Reddit. Compilers / Interpreters# Compilers are special programs that translate code written in a programming language into the binary 0s and 1s that a computer runs. There are two varieties of compilers: standard compiler: takes a whole computer program and turn it all into binary so it can be run later interpreter: turns the computer language code into binary as it is running the program Python uses an interpreter, so when you run a Python program, the interpreter translates the Python code into binary while it’s running it. Programming in this book# Throughout the rest of this book, we will take ideas for programs written in English and translate them into Python code, and we will look at Python code and translate it back into English descriptions of what the code does. The Python Interpreter will then translate this code into binary instructions, which the computer will then run. Next, let’s look at an example computer program that posts one tweet.

      Grace Hopper’s work shows how programming languages and compilers make computers more accessible to humans by acting as a bridge between human language and machine code. By introducing higher-level languages and compilers, she shifted programming from thinking only in binary to thinking in structured steps, which made software development more flexible and powerful. This structure also highlights that programmers play a key role in translating human intent into actions computers can execute.

    1. Differentiate between law, ethics, bioethics, etiquette, and protocol, and explain what can happen when actionsfall outside acceptable boundaries. You'll also learn why having a code of ethics is essential in healthcare.�� Explore ethical issues related to professional responsibilities and how healthcare workers interact with patients,families, and coworkers.�� Start building a foundation for supporting dying patients and their families and begin forming your own views aboutdeath and the grieving process.�� Apply ethical principles and theories to real-world healthcare scenarios.�� Gain a basic understanding of the four major ethical theories and use them to analyze case studies

      I really like and appreciate how you explain what should be learned in the course if we are successful in it. It help me understand what I should get out of the course and how it will help me in my future career.

    2. You shouldapproach course topics with an open mind and be ready to hear ideas and opinions that differ from yours

      I like this section cause during the discussions ill be able to wright my own opinions and read other opinions with different topics, which the vaquero honor code plays apart this section.

    1. Youtuber Innuendo Studios talks about the way arguments are made in a community like 4chan: You can’t know whether they mean what they say, or are only arguing as though they mean what they say. And entire debates may just be a single person stirring the pot [e.g., sockpuppets]. Such a community will naturally attract people who enjoy argument for its own sake, and will naturally trend oward the most extremte version of any opinion. In short, this is the free marketplace of ideas. No code of ethics, no social mores, no accountability. … It’s not that they’re lying, it’s that they just don’t care. […] When they make these kinds of arguments they legitimately do not care whether the words coming out of their mouths are true. If they cared, before they said something is true, they would look it up. The Alt-Right Playbook: The Card Says Moops by Innuendo Studios While there is a nihilistic worldview where nothing matters, we can see how this plays out practically, which is that they tend to protect their group (normally white and male), and tend to be extremely hostile to any other group. They will express extreme misogyny (like we saw in the ihilistic worldview where nothing matters, we can see how this plays out practically, which is that they tend to protect their group (normally white and male), and tend to be extremely hostile to any other group. They will express extreme misogyny (like we saw in the Rules of the Internet: “Rule 30. There are no girls on the internet. Rule 31. TITS or GTFO - the choice is yours”), and extreme racism (like an invented Nazi My Little Pony character). Is this just hypocritical, or is it ethically wrong? It depends, of course, on what tools we use to evaluate this kind of trolling. If the trolls claim to be nihilists about ethics, or indeed if they are egoists, then they would argue that this doesn’t matter and that there’s no normative basis for objecting to the disruption and harm caused by their trolling. But on just about any other ethical approach, there are one or more reasons available for objecting to the disruptions and harm caused by these trolls! If the only way to get a moral pass on this type of trolling is to choose an ethical framework that tells you harming others doesn’t matter, then it looks like this nihilist viewpoint isn’t deployed in good faithf the only way to get a moral pass on this type of trolling is to choose an ethical framework that tells you harming others doesn’t matter, then it looks like this nihilist viewpoint isn’t deployed in good faith11. Rather, with any serious (i.e., non-avoidant) moral framework, this type of trolling is ethically wrong for one or more reasons (though how we explain it is wrong depends on the specific framework).

      This reading helped me see that trolling in spaces like 4chan isn’t just about “free speech” or joking, but about a lack of care for truth and harm. The idea that arguments are made without concern for whether they are true explains why these communities drift toward extreme misogyny and racism. While trolls may claim a nihilistic or egoist stance, this feels less like a genuine ethical position and more like a shield to avoid responsibility. Under almost any serious moral framework, the deliberate disruption and harm caused by trolling is ethically wrong, especially when it consistently targets marginalized groups.

    2. 7.6.3. Trolling and Nihilism# While trolling can be done for many reasons, some trolling communities take on a sort of nihilistic philosophy: it doesn’t matter if something is true or not, it doesn’t matter if people get hurt, the only thing that might matter is if you can provoke a reaction. We can see this nihilism show up in one of the versions of the self-contradictory “Rules of the Internet:” 8. There are no real rules about posting … 20. Nothing is to be taken seriously … 42. Nothing is Sacred Youtuber Innuendo Studios talks about the way arguments are made in a community like 4chan: You can’t know whether they mean what they say, or are only arguing as though they mean what they say. And entire debates may just be a single person stirring the pot [e.g., sockpuppets]. Such a community will naturally attract people who enjoy argument for its own sake, and will naturally trend oward the most extremte version of any opinion. In short, this is the free marketplace of ideas. No code of ethics, no social mores, no accountability. … It’s not that they’re lying, it’s that they just don’t care. […] When they make these kinds of arguments they legitimately do not care whether the words coming out of their mouths are true. If they cared, before they said something is true, they would look it up. The Alt-Right Playbook: The Card Says Moops by Innuendo Studios While there is a nihilistic worldview where nothing matters, we can see how this plays out practically, which is that they tend to protect their group (normally white and male), and tend to be extremely hostile to any other group. They will express extreme misogyny (like we saw in the Rules of the Internet: “Rule 30. There are no girls on the internet. Rule 31. TITS or GTFO - the choice is yours”), and extreme racism (like an invented Nazi My Little Pony character). Is this just hypocritical, or is it ethically wrong? It depends, of course, on what tools we use to evaluate this kind of trolling. If the trolls claim to be nihilists about ethics, or indeed if they are egoists, then they would argue that this doesn’t matter and that there’s no normative basis for objecting to the disruption and harm caused by their trolling. But on just about any other ethical approach, there are one or more reasons available for objecting to the disruptions and harm caused by these trolls! If the only way to get a moral pass on this type of trolling is to choose an ethical framework that tells you harming others doesn’t matter, then it looks like this nihilist viewpoint isn’t deployed in good faith1. Rather, with any serious (i.e., non-avoidant) moral framework, this type of trolling is ethically wrong for one or more reasons (though how we explain it is wrong depends on the specific framework).

      This section helped me think about trolling in a much more nuanced way, especially the idea that disruption itself isn’t automatically good or bad. I found the discussion about group formation and norm enforcement really useful, because it explains why trolling can feel threatening—it challenges the patterns and signals that groups rely on to define who belongs. The comparison between trolling, protest, and revolution also stood out to me, since it shows how moral judgment often depends on whether we see the existing social order as legitimate. Overall, this section made it clear that evaluating trolling ethically requires looking beyond intent or humor and examining what is being disrupted and who is harmed or protected by that disruption.

    1. The politician who ac-cepts "favors" and in return provides protection to the illegal enterprisesof organized crime not only gives a living witness to the criminal's worldview but also confirms for the organized criminal, in violating his oath ofoffice, that a criminal's code is superior to society's

      The code utilized by these organized criminals does not just contribute to the romanticization of the criminal life, by allowing them to act as though they stand for something, nor reinforce a sense of superiority by getting others to dip into it. Related to that superiority, there is a sense of power in deciding what rules they live by. Another part of their life that they can exert control over, rather than conforming.

      1. You were not as qualified as the other applicants.

      This is very harsh and direct. You could say, we were impressed with your application, but unfortunately, we can not accept your application this year.

      1. I won't stay late to do that assignment.

      This is a rude way of saying, I am sorry, but tonight I have a lot of things on my plate, but I can get back to it tomorrow.

      1. Parking fees have increased this year.

      This seems this way of saying bad news without giving any follow-up information. I would have said that, due to rising city water bills and road costs, parking fees had to be increased.

      1. We will not authorize any more vacation requests for the month of July.

      This is very direct and seems to have zero wiggle room. I would have said, because July is our busiest month, there will be no more vacation days given out, but I will make it up to you at some other point in the year.

      1. Employees are not allowed to telework on Mondays and Fridays.

      This is very inflexible and rigid for those who might need to have those days done online. There will be a formal sign-up sheet every week with the same number of slots per day, so that the employees can plan out their week in advance.

      1. You are dressed inappropriately for the office.

      This is rude, as they may not have realized it was inappropriate. I would have said your attire is not in compliance with the guidelines put in place by our office dress code.

    1. You are dressed inappropriately for the office.

      Feels personal and embarrassing. Instead, I would say, "Our office follows a business-casual dress code, and today’s outfit does not align with those guidelines."

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work presents an interesting circuit dissection of the neural system allowing a ctenophore to keep its balance and orientation in its aquatic environment by using a fascinating structure called the statocyst. By combining serial-section electron microscopy with behavioral recordings, the authors found a population of neurons that exists as a syncytium and could associate these neurons with specific functions related to controlling the beating of cilia located in the statocyst. The type A ANN neurons participate in arresting cilia beating, and the type B ANN neurons participate in resuming cilia beating and increasing their beating frequency.

      Moreover, the authors found that bridge cells are connected with the ANN neurons, giving them the role of rhythmic modulators.

      From these observations, the authors conclude that the control is coordination instead of feedforward sensory-motor function, a hypothesis that had been put forth in the past but could not be validated until now. They also compare it to the circuitry implementing a similar behavior in a species that belongs to a different phylum, where the nervous system is thought to have evolved separately.

      Therefore, this work significantly advances our knowledge of the circuitry implementing the control of the cilia that participate in statocyst function, which ultimately allows the animal to correct its orientation. It represents an example of systems neuroscience explaining how the nervous system allows an animal to solve a specific problem and puts it in an evolutionary perspective, showing a convincing case of convergent evolution.

      Strengths:

      The evidence for how the circuitry is connected is convincing. Pictures of synapses showing the direction of connectivity are clear, and there are good reasons to believe that the diagram inferred is valid, even though we can always expect that some connections are missing.

      The evidence for how the cilia change their beating frequency is also convincing, and the paradigm and recording methods seem pretty robust.

      The authors achieved their aims, and the results support their conclusions. This work impacts its field by presenting a mechanism by which ctenophores correct their balance, which will provide a template for comparison with other sensory systems.

      Thank you very much for these comments.

      Weaknesses:

      The evidence supporting the claim that the neural circuitry presented here controls the cilia beating is more correlational because it only relies on the fact that the location of the two types of ANN neurons coincides with the quadrants that are affected in the behavioral recordings. Discussing ways by which causality could be established might be helpful.

      We have now added additional discussions in a new “Future Directions” section explaining that for example calcium imaging or targeted neuron ablations could be used in future work to establish causality. This would require the development of genetic delivery techniques to e.g. introduce GCaMP calcium sensor or transgenic reporters.

      The explanation of the relevance of this work could be improved. The conclusion that the work hints at coordination instead of feedforward sensory-motor control is explained over only a few lines. The authors could provide a more detailed explanation of how the two models compete (coordination vs feedforward sensory-motor control), and why choosing one option over the other could provide advantages in this context.

      We added a more detailed explanation about the two types of model and why we believe that a coordination model is more compatible with our connectome data.

      “An alternative model for the function of the nerve net would be a feedforward sensory-motor system, in which balancer cells provide mechanosensory input to motor effectors via the nerve net, similar to a reflex arc. None of our observations support such a sensory-motor model. There are no synaptic pathways from balancer cells or any other sensory cells to the nerve net. The only synaptic input to ANNs comes from the bridge cells (discussed below) and from each other. The three synaptically interconnected ANNs may generate endogenous rhythm that controls balancer cilia and is influenced by bridge input. ANNs may also be influenced by neuropeptides secreted by other aboral organ neurons. Such chemical inputs may underlie the flexibility of gravitaxis and its modulation by other cues (e.g. light). Overall, the coordination model parsimoniously explains both the ANN wiring topology and the observed dynamics, whereas a simple feedforward reflex does not.”

      Since the fact that the ANN neurons form a syncytium is an important finding of this study, it would be useful to have additional illustrations of it. For instance, pictures showing anastomosing membranes could typically be added in Figure 2.

      We have now included a movie (Video 3) showing a volumetric reconstruction of a segment of an ANN neuron, which highlights the anastomosing morphology in greater detail than static images.

      “Video 3. Volumetric reconstruction of a single ANN Q1-4 neuron showing syncytial soma (cyan) and nuclei (magenta). The rotating view highlights the anastomosing morphology, although not all fine details could be reconstructed due to data limitations.”

      Also, to better establish the importance of the study, it could be useful to explain why the balancers’ cilia spontaneously beat in the first place (instead of being static and just acting as stretch sensors).

      We have discussed in more detail why it may be important for the balancer cilia to beat.

      “The observation that balancer cilia beat spontaneously, even in the absence of external tilt, suggests that they are active sensory oscillators rather than static stretch sensors. Their spontaneous beating could set a dynamic baseline of sensitivity, which can then be modulated by ANN inputs or sensory changes during tilt. Such a dynamic system may be more sensitive to small deflections and be more responsive [@Lowe1997]. Thus, the regulated beating of balancer cilia should not be seen as noise, but as an adaptive feature that enables flexible and robust graviceptive responses. The ctenophore balancer may thus use active ciliary oscillations for enhanced sensorimotor integration similar to other sensory systems [@Wan_2023].”

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors describe the production of a high-resolution connectome for the statocyst of a ctenophore nervous system. This study is of particular interest because of the apparent independent evolution of the ctenophore nervous system. The statocyst is a component of the aboral organ, which is used by ctenophores to sense gravity and regulate the activity of the organ’s balancer cilia. The EM reconstruction of the aboral organ was carried out on a five-day-old larva of the model ctenophore Mnemiopsis leidyi. To place their connectome data in a functional context, the authors used high-speed imaging of ciliary beating in immobilized larvae. With these data, the authors were able to model the circuitry used for gravity sensing in a ctenophore larva.

      Strengths:

      Because of it apparently being the sister phylum to all other metazoans, Ctenophora is a particularly important group for studies of metazoan evolution. Thus, this work has much to tell us about how animals evolved. Added to that is the apparent independent evolution of the ctenophore nervous system. This study provides the first high-resolution connectomic analysis of a portion of a ctenophore nervous system, extending previous studies of the ctenophore nervous system carried out by Sid Tamm. As such, it establishes the methodology for high-resolution analysis of the ctenophore nervous system. While the generation of a connectome is in and of itself an important accomplishment, the coupling of the connectome data with analysis of the beating frequency of balancer cell cilia provides a functional context for understanding how the organization of the neural circuitry in the aboral organ carries out gravity sensing. In addition, the authors identified a new type of syncytial neuron in  Mnemiopsis. Interestingly, the authors show that the neural circuitry controlling cilia beating in Mnemiopsis shares features with the circuitry that controls ciliary movement in the annelid Platynereis, suggesting convergent evolution of this circuitry in the two organisms. The data in this paper are of high quality, and the analyses have been thoroughly and carefully done.

      Weaknesses:

      The paper has no obvious weaknesses.

      We thank the reviewer for these comments.

      Reviewer #3 (Public review):

      Summary:

      It has been a long time since I enjoyed reviewing a paper as much as this one. In it, the authors generate an unprecedented view of the aboral organ of a 5-day-old ctenophore. They proceed to derive numerous insights by reconstructing the populations and connections of cell types, with up to 150 connections from the main Q1-4 neuron.

      Strengths:

      The strengths of the analysis are the sophisticated imaging methods used, the labor-intensive reconstruction of individual neurons and organelles, and especially the mapping of synapses. The synaptic connections to and from the main coordinating neurons allow the authors to create a polarized network diagram for these components of the aboral organ. These connections give insight into the potential functions of the major neurons. This also gives some unexpected results, particularly the lack of connections from the balancer system to the coordinating system.

      Thank you for these positive comments on the paper.

      Weaknesses:

      There were no significant weaknesses in the paper - only a slate of interesting unanswered questions to motivate future studies.

      Recommendations for the authors:

      Reviewing Editor Comments:

      In consultation, the reviewers recommend that improving the evidence to “exceptional” would require additional perturbation experiments (e.g., ablation of specific neurons), as Reviewer 1 suggests. They also recommend adding a “Future Directions” section to the manuscript, because it opens up so many new experimental directions.

      We have added a new “Future Directions” section at the end of the Discussion. To carry out the proposed perturbation or calcium imaging experiments would require significant additional work and method development. We are actively working in establishing mRNA and DNA injection into ctenophore zygotes to enable live imaging, cell labelling or ablations in the future.

      Reviewer #1 (Recommendations for the authors):

      Suggestions for improved or additional experiments, data, or analyses:

      To establish causality (neurons control balancer cilia), an important experiment would be to manipulate each of these neuronal populations (e.g., by ablating them) and measure the effect of these ablations on the beating frequency of the balancer cilia of the four quadrants. Moreover, direct observation of neuronal activity (e.g., by using calcium imaging) would also provide more compelling evidence for neuronal control.

      We agree with the reviewer that such perturbation experiments would be needed to establish causality. Such experiments are currently still not possible in ctenophoes and would require significant technology development. We discuss such experiments in the “Future directions” section and also place this in the context of the currently available techniques in ctenophores. We are actively working on this but waiting for such technological breakthroughs and new experiments would significantly delay the publication of a version of record of the paper.

      Recommendations for improving the writing and presentation:

      ANN neurons are described in great detail, though SNN neurons are described more loosely. Perhaps a more detailed description of SNN neurons would be helpful.

      We added the information on SNNs to show that these cells are distinct from the ANN neurons. Since our focus is on the aboral organ, we did not aim for a comprehensive reconstruction of SNNs. Several of the processes of the SNNs are also truncated and outside our EM volume. We have nevertheless added additional details about the morphology and connectivity of SNN neurons.

      “Near the perifery of the aboral organ, we identified four further anastomosing nerve-net neurons. These resembled the previously reported syncytial subepithelial nerve net (SNN) neurons in the body wall of Mnemiopsis (Figure 2–figure supplement 1C–G) and were clearly distinct from the ANN neurons (both in location and morphology). SNN neurons show a blebbed morphology and contain dense core vesicles @Burkhardt2023 but no synapses.”

      Minor corrections to the text and figures:

      (1) Figure 2 C): “mitochondia” instead of “mitochondria”.

      corrected

      (2) Figure 3. Title: “balancer and and bridge”.

      corrected

      (3) Figure 3.C) “shown in xxx color”

      corrected

      Reviewer #2 (Recommendations for the authors):

      Clearer usage of the terms statocyst, aboral organ, aboral nerve net, statolith, dome, and lithocytes would be helpful. For readers not familiar with ctenophore anatomy, things can get a bit confusing. A single schematic with all of these terms would be helpful. In Figure 1E, there is a label “dc”. Should this be “do”?

      We have added an annotated schematic to Figure 1, explaining these terms.

      Figure 1C “The statocyst is a cavity-like organ enclosed by the dome cilia (do), which contains the statolith formed by lithocytes (li) and supported by the balancer cilia (bal).”

      Reviewer #3 (Recommendations for the authors):

      My comments are numerous, but mostly minor suggestions for improving the clarity.

      [Suggested insertions/changes are indicated by square brackets]

      (1) [It would be much easier to review this if there were line numbers, or with a double-spaced manuscript that was more accommodating for markup.]

      Thank you for this comment. We have increased the line spacing in the revised version. (We set the CSS line-height property on the html ‘body’ element to 2em).

      (2) The terms statolith, statocyst, and lithocytes can be confusing, so it would be nice to have an upfront definition of how they relate to each other.

      We have now explain these terms in the Introduction and also have improved the annotation of Figure 1.

      Figure1C. “The statocyst is a cavity-like organ enclosed by the dome cilia (do), which contains the statolith formed by lithocytes (li) and supported by the balancer cilia (bal).”

      (3) Statolith is spelled as statolyth in the early pages, but statolith in the later pages. I think -lith is more common, but in any case, these should be standardized.

      corrected to ‘statolith’

      ABSTRACT:

      (1) Differential load[s] on the balancer cilia [lead] to altered

      changed

      (2) We used volume electron microscopy (vEM) to image the aboral organ.

      changed

      (3) also form reciprocal connections with the bridge cells.

      corrected

      INTRODUCTION:

      (1) “identify conserved neuronal markers in ctenophores” - confusing - does this mean conserved across ctenophores, or conserved in ctenophores and other animals?

      changed to “classical neuronal markers”

      (2) “either increase or decrease their [ciliary] activity, indicating” - otherwise it sounds like the balancers are increasing activity.

      changed to “balancer cells may either increase or decrease their ciliary activity”

      (3) after “matches the setup used in high-speed imagine experiments”, it might be nice to add a statement like “Future studies could potentially investigate activity in the inverted orientation, when the statolith is suspended below the cilia, to see if the response differs.”

      In this sentence we referred to the orientation of the animals in our figures. There is a consensus among ctenophore researchers that when depicting ctenophores, the aboral organ should face downwards. However, for this paper we chose the opposite orientation to better match our experiments and help interpreting the results. We changed the text to: “In this study, we represent ctenophores with their aboral organ facing upwards (”balancer-up” posture), as this configuration facilitates intuitive interpretation of balance-like functions and matches the setup used in high-speed imaging experiments. ”

      We added the sentences “Future experiments could also explore how orientation affects the response of balancer cilia. For example, when the statolith is suspended below the cilia (the”balancer-down” posture), ciliary beating patterns may differ from what we observed here in the “balancer-up” configuration.” to the section Future Directions”.

      (4) “abolished by calcium[-]channel inhibitors”

      corrected

      (5) “By functional imaging, we uncovered” - It is not clear what functional imaging is. Maybe a fewword definition here, and be sure to explain in the methods.

      changed to “By high-speed ciliary imaging”. The details of the imaging are explained in the Methods section under “Imaging the Activity of Balancer Cilia”.

      RESULTS:

      (1) “five-day-old” - is it worth saying post-fertilization here?

      Thank you for pointing this out. In accordance with Presnell et al. (2022), we use post-hatching as the reference. We have revised the text in the Materials and Methods section to read: “5-day-old (5 days post-hatching)”

      (2) “We classified these cells into cell types [based on …]” - specify a bit about how you classified them based on morphology, the presence of organelles, etc.

      We added a clarification. “Our classification was based on i) ultrastructural features (e.g. number of cilia), ii) cell morphology (e.g. nerve net or bridge cells), iii) unique organelles (e.g. lamellate body, plumose cells), iv) and similarities to cell types previously described by EM. Our classification agrees with the cell types identified in the 1-day-old larva [@ferraioli2025].”

      (3) “CATMAID only supports [bifurcating] skeleton trees” - Correct?

      yes, a node in CATMAID cannot be fused to another node of the same skeleton to represent anastomoses

      FIGURE 1:

      (1) It is not worth redrawing and renumbering everything, but I wish the lateral view in A matched the rotated aboral view in B, instead of having to do two rotations to get the alignment to coincide. (Rotating panel B 90{degree sign} clockwise would make them match, but then it wouldn’t coincide with all the subsequent figures.)

      Thank you for the suggestion. We have replaced panel A with a lateral view that now matches panel B.

      (2) The labels on Figure 1 are a mix of two typefaces (Helvetica and Myriad?). They should be standardized to all use one typeface (preferably Helvetica).

      we have changed the font to Helvetica

      (3) Panel C legend: arrows are not really arrows. Say “Eye icons” or something like that. Can you show the location of the anal pores in the DIC image?

      Changed to ‘eye icons’. The anal pores are usually closed and only open briefly therefore it is not clear where exactly they would be, so indicating their position would be misleading.

      (4) Panel F, I cannot see the lines mentioned in the legend at all, except for maybe a tiny wisp in a couple of places. Either omit or make visible.

      changed to “The spheres indicate the position of nuclei in the reconstructed cells.”

      (5) Panel G. “Cells are color coded according to quadrants”… but unfortunately, the color scale is 90{degree sign} off of what is presented in the rest of the panels and the paper. Q1 and Q3 have been blue, but now Q2+4 are blue/purple, while Q1+3 are orange/yellow. Again, it seems like too much work to recolor panel G, but in future, it would be nice to maintain that consistency, especially since other panels specifically mention the consistent colors.

      We have changed the color code in panels B, C and E to match G and the subsequent panels/figures.

      RESULTS: Aboral synaptic nerve net

      (1)“We reconstructed three aboral nerve-net (ANN) neurons” - out of how many total? Were these three just the first ones traced, or are they likely to be all of the multi-domain neurons? One can’t tell if these are the top 3 (out of X), or if there are other multi-quad neurons that were not traced. Are there any Q1Q4 or Q2Q3 neurona? Specify overall composition.

      There are only three ANN neurons in the aboral organ. These are all completely reconstructed and contained within the volume. We have clarified this in the text. “We identified and reconstructed three aboral nerve-net (ANN) neurons, each exhibiting a syncytial morphology characterized by anastomosing membranes and multiple nuclei (ranging from two to five) (Figure 2A and B, Figure 2–figure supplement 1C). These three neurons are the only fully reconstructed ANN neurons contained within the volume. Several small ANN-like fragments were also observed at the periphery of the aboral organ, but their connectivity to the main ANN remains uncertain.”

      FIGURE 2:

      (1) Panel C: “N > 2 cells for each cell type” - is that supposed to say “N > 2 mitochondria”? More than 2 cells in all the types shown in the graph.

      It is number of cells for each cell type

      (2) Panel D: Is this the wrong caption? I can only see green and black circles, not red, yellow, or blue. Make them larger or “flat” (circled, not shaded spheres) if they are supposed to be visible

      Thank you for pointing this out. The caption was incorrect and has been corrected to match the figure.

      (3) Panel E: Amazing to see the cross-network connections!

      Thank you

      (4) Again, it is great to see the three ANN mapped out, but … are there other connections that weren’t mapped in this study? Other high-level coordinating neurons? ANN_Q1Q4 or Q2Q3?

      The reconstruction is complete and there are no other neurons or connections. Given the large size of ctenophore synapses, we are confident that we identified all or most synapses and their connections.

      RESULTS: Synaptic connectome

      (1) “displaying rotational symmetry” - This is one of the things I am most curious about. Where is the evidence of rotational symmetry in the network diagram? Is it the larger number of connections to Q2 and Q4? Any evidence of rotational symmetry, like Q1 and Q3 connect to Q2 and Q4 respectively, but not the other way around?

      changed to “displaying biradial symmetry”, we do not consider the slight difference in synapse number from ANN Q1-4 to the Q1-Q3 vs. Q2-Q4 balancers as significant or strong enough evidence for a single rotational symmetry (i.e. 180 degrees rotation)

      (2) “Surprisingly” - this *was* really surprising. There have to be some afferent neurons connecting from the balancers, don’t there? I can’t remember the connections to the SNN, but is there a tertiary set of ANNs that connect between the balancers and the top 3 ANNs? I would like a little more discussion about this.

      Indeed, this is why this is so surprising. Most people would have expected some output connections from the balancer to the nerve net or elsewhere. There are none. We have the complete balancer network and all balancer cells are ‘sink nodes’ (inputs only)(Figure3–figure supplement 1).

      we added a short statement in the beginning of the Bridge Cells as Feedback Regulators of Ciliary Rhythms section noting that no direct connections from the balancers to the ANN were found and that all balancer cells act as sink nodes (inputs only; Figure 3–figure supplement 1). This highlights that bridge cells are indeed the sole neuronal input to the ANN circuit.

      Figure 3:

      (1) As you know, during development, the diagonally opposite cells have a shared heritage and shared functionality. Are there neuronal signatures that correspond to the rotational symmetry that we see, for example, in the position of the anal pores?

      We did not find any evidence in neuronal complement for a diagonal symmetry, suggesting that neuronal organization does not simply mirror the organism’s rotational body symmetry.

      (2) Do you have the information to say whether there are any diagonal or asymmetric connections? Can’t tell if those would have shown up in the mapping efforts or if you focused on the major ones only.

      Based on our complete mapping, we did not find evidence for a diagonal pattern. The connectivity instead shows a biradial organization.

      (3) “extending across opposite quadrant regions” - to me, opposite would be diagonally opposite, but this looks like a set of cells between Q1 and Q2 is connecting to a sister-set in Q3+Q4. I wonder if, in a more detailed view, you could see whether this is a rotational correspondence, rather than a reflection. There are some subtle hints of this in the aboral view, with some cells on the right of the blue cluster and the left of the magenta cluster.

      changed to “extending across tentacular-axis-symmetric quadrant regions” for clarity

      (4) As with Figure 2, I do not see any circles/spheres that are yellow, red, or blue! There are some traces of what appear to be other neurons that have these colors, but nothing that would suggest the localization of mitochondria.

      Thank you for pointing this out. We have corrected the caption to match the figure, as in the previous item.

      (5) The connectivity map is very cool, but the caption does not seem to correspond to the version included in the manuscript. I don’t see any hexagons; all arrows seem to have the same thickness.

      changed to: “Complete connectivity map of the gravity-sensing neural circuit. Cells belonging to the same group are shown as diamonds, and the number of cells is added to their labels. The number of synapses is shown on the arrows.”

      RESULTS: Dynamics of balancer cilia

      (1) The orientation of the stage+larvae is a bit hard to follow. Maybe say the sagittal or tentacular plane is parallel to the sample stage and the gravity vector?

      we added “Larvae were oriented with their sagittal or tentacular plane parallel to the sample stage.”

      (2) “We could simultaneously image Q1(3) and Q2(4). The meaning of the numbers in () is not clear. Either way that I try to interpret it does not match the diagrams. Should this say viewing the tentacular plane, you can image Q1 and 4 or Q2 and 3?

      Thank you for spotting this mistake, we have changed to: “In larvae with their sagittal plane facing the objective, we could compare balancer-cilia movements between Q1 vs. Q2 or Q3 vs. Q4. In other larvae oriented in the tentacular plane, we could simultaneously image Q1 and Q4 or Q2 and Q3.”

      (3) Typo: episod[e]s were excluded

      Corrected

      DISCUSSION:

      This section is quite clean. Maybe mention some future directions:

      We have added a “Future Directions” section

      (1) Do these networks change during development? Five-days-old is still quite undeveloped - what would it look like in an adult specimen? Would you expect a larger version of the same or more diverse connections?

      As far as we know from work on aboral organs in adult ctenophores, the same structures and cells can be found. We do not know how the network will develop. We know that at 5 days the balancer is fully functional and the animals can orient and their behaviour is coordinated. So the wiring may not change extensively later in development. In the 1-day-old larva, Ferraioli et al. did not distinguish ANN neurons as a separate population, as these were merged with SNNs in their dataset. This suggests that significant cellular and circuit maturation likely occurs between 1 and 5 days.

      METHODS: Imaging the Activity of Balancer Cilia

      (1) “we selected only larvae whose aboral-oral axis was oriented nearly perpendicular to the gravitational vector”. Shouldn’t this be “nearly parallel to the gravity vector” not perpendicular?

      Thank you for spotting this, corrected.

    1. You'll also learn why having a code of ethics is essential in healthcare

      I am happy I chose to take this course I genuinely think it will help since many students and myself want to go into healthcare jobs. Learning over the code of ethics and knowing why it is a important will help prepare for many ethical situations at a healthcare job.

    1. In automation theory, a "centaur" is a person who is assisted by a machine. A "reverse centaur" is someone who has been conscripted into assisting a machine. If you're a software engineer who uses AI to write routine code that you have the time and experience to validate, deploying your Fingerspitzengefühl and process knowledge to ensure that it's fit for purpose, it's easy to see why you might find using AI (when you choose to, in ways you choose to, at a pace you choose to go at) to be useful. But if you're a software engineer who's been ordered to produce code at 10x, or 100x, or 10,000x your previous rate, and the only way to do that is via AI, and there is no human way that you could possibly review that code and ensure that it will not break on first contact with the world, you'll hate it (you'll hate it even more if you've been turned into the AI's accountability sink, personally on the hook for the AI's mistakes)

      at a speed you can keep up with

    2. For a long time, firms have nurtured a false belief that code costs less to run over time: after an initial shakedown period in which the bugs in the code are found and addressed, code ceases to need meaningful maintenance.
    1. You already use different varieties of English in different parts of your life;

      I interpret this as knowing about the importance of "code-switching." Most people learn early in their lives to change their vocabulary, tone, and overall vibe depending on who they are talking to, whether it's a close friend, loved one, or authority figure.

    1. If they prefer to not have a smartphone that’s the life they choose to live.

      This strikes me as an equity problem framed as personal or individual choice. I'm thinking also about the proliferation of "free public washrooms" springing up around downtown where I live; in theory, these single-stall locking bathrooms improve access. In practice, you must have a smartphone to unlock the bathroom by way of a QR code. Where does this leave unhoused and more vulnerable populations who would likely benefit the most from safe, reliable, accessible bathroom access?

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Zhang and colleagues examine neural representations underlying abstract navigation in entorhinal cortex (EC) and hippocampus (HC) using fMRI. This paper replicates a previously identified hexagonal modulation of abstract navigation vectors in abstract space in EC in a novel task involving navigating in a conceptual Greeble space. In HC, the authors identify a three-fold signal of the navigation angle. They also use a novel analysis technique (spectral analysis) to look at spatial patterns in these two areas and identify phase coupling between HC and EC. Interestingly, the three-fold pattern identified in the hippocampus explains quirks in participants' behavior where navigation performance follows a three-fold periodicity. Finally, the authors propose a EC-HPC PhaseSync Model to understand how the EC and HC construct cognitive maps. The wide array and creativity of the techniques used is impressive but because of their unique nature, the paper would benefit from more details on how some of these techniques were implemented.

      Comments on revisions:

      Most of my concerns were adequately addressed, and I believe the paper is greatly improved. I have two more points. I noticed that the legend for Figure 4 still refers to some components of the previous figure version, this should be updated to reflect the current version of the figure. I also think the paper would benefit from more details regarding some of the analyses.

      Specifically, the phase-amplitude coupling analysis should have a section in the methods which should be sure to clarify how the BOLD signals were reconstructed.

      (1)“…I noticed that the legend for Figure 4 still refers to some components of the previous figure version, this should be updated to reflect the current version of the figure…”.

      Thank you for pointing this out. We have revised the legend of Figure 4 by removing the significance notation “***: p < 0.001”, which referred to elements from a previous version of the figure.

      (2)“…I also think the paper would benefit from more details regarding some of the analyses. Specifically, the phase-amplitude coupling analysis should have a section in the methods which should be sure to clarify how the BOLD signals were reconstructed”.

      We agree and appreciate the reviewer’s helpful suggestion. We have added a dedicated subsection entitled “Phase–amplitude coupling” to the Materials and Methods, in which we provide a detailed description of how the EC and HPC BOLD signals were reconstructed and how the coupling analysis was implemented. Correspondingly, we refined the description of this analysis in the Results section under “Phase synchronization between the HPC and EC activity”. The revised sections have been included below for your convenience. 

      Materials and Methods: Phase–amplitude coupling

      To quantify the spatial peak relationship between EC and HPC BOLD activity, we implemented a cross-frequency amplitude–phase coupling analysis in the directional space (Canolty et al., 2006). Rather than analyzing raw BOLD signals, we reconstructed 6-fold EC activity and 3-fold HPC activity in each voxel using sinusoidal modulation weights (β<sub>sine</sub> and β<sub>cosine</sub>) estimated from the raw BOLD signals. Specifically, activity was modeled as β<sub>cosine</sub>cos(kθ) + β<sub>sine</sub>sin(kθ), where k denotes the rotational symmetry. This approach selectively captures the hypothesized spatial symmetries of neural activity (e.g., 6-fold or 3-fold periodicity) as a function of movement direction. For this coupling analysis, we used participants’ original movement directions (i.e., without applying orientation calibration). The reconstructed 6-fold EC and 3-fold HPC activity were then converted into analytic representations using the Hilbert transform, yielding the instantaneous phase of the HPC (ϕ<sub>HPC</sub>) and the amplitude envelope of the EC (A<sub>ERC</sub>). HPC phases were classified into nine bins. The composite analytic signal, defined as z = A<sub>ERC</sub>e<sup>iϕHPC</sup>, was used to compute the modulation index M (Canolty et al., 2006), defined as the absolute value of the mean of z values, quantifying the scalar coupling strength between EC amplitude and HPC phase within each bin. A surrogate dataset, a null distribution of the modulation indices (M<sup>-</sup>), was generated by spatially offsetting the EC amplitude relative to the HPC phase across all possible spatial lags. The mean of this surrogate distribution was used as the baseline reference against which the observed coupling strength was compared.

      Results: Phase synchronization between the HPC and EC activity

      To examine whether the spatial phase structure in one region could predict that in another, we tested whether the orientations of the 6-fold EC and 3-fold HPC periodic activities, estimated from odd-numbered sessions using sinusoidal modulation with rotationally symmetric parameters, were correlated across participants. A cross-participant circular correlation was conducted between the spatial phases of the two areas to quantify the spatial correspondence of their activity patterns (EC: purple dots; HPC: green dots) (Jammalamadaka & Sengupta, 2001). The analysis revealed a significant circular correlation (Fig. 4a; r = 0.42, p < 0.001), as reflected by the continuous color progression across the participants (i.e., the colored lines connecting each pair of the EC and HPC dots in Fig. 4a), suggesting that participants with smaller hippocampal phases (green, outer ring) tended to have smaller entorhinal phases (purple, inner ring), and vice versa.

      In addition to the across-participant phase correlation, we further examined the spatial alignment between the 6-fold EC and 3-fold HPC activity patterns. Given that the spatial phase of the HPC is hypothesized to depend on EC projections, particularly along the three primary axes of the hexagonal code, we examined whether the periodic activities of the EC and HPC were spatially peak-aligned. Notably, unlike previous studies that focused on temporal coherence of neural oscillations (Buzsaki, 2006; Maris et al., 2011; Friese et al., 2013), our analysis focused on periodic coupling between brain areas in the directional space. To test spatial peak alignment between EC and HPC, a cross-frequency spatial coupling analysis (adapted from the amplitude–phase coupling framework; Canolty et al., 2006) was employed to identify at which HPC phase the EC exhibited maximal amplitude modulation. If the activities of both areas were peak-aligned (i.e., no peak offset), a strong coupling at phase 0 of the HPC would be expected as shown by the one-cyclebased schema in Fig. 4b. In doing so, the instantaneous phase of the HPC and the amplitude envelope of the EC were extracted from the reconstructed activity using the Hilbert transform (see methods for details). HPC phases were classified into nine bins, and the modulation index (M), quantifying the scalar coupling strength between EC amplitude and HPC phase, was computed within each bin. As a result, significant coupling was observed in the bin centered at phase 0 of the HPC (Fig. 4c; t(32) = 2.57, p = 0.02, Bonferroni-corrected across tests; Cohen’s d = 0.45). In contrast, no significant coupling was found in other bins (p > 0.05). To rule out the possibility that the observed coupling was driven by a potential harmonic (integer multiple) relationship between the 3-fold and 6-fold periodicities, we additionally conducted control analyses using 9-fold and 12-fold EC components. However, no significant coupling was observed in these controls (Fig. 4c; p > 0.05). Together, these results confirmed selective alignments of spatial peaks between the 6fold EC and 3-fold HPC periodicity in the conceptual direction domain.

      Reviewer #2 (Public review):

      The authors report results from behavioral data, fMRI recordings, and computer simulations during a conceptual navigation task. They report 3-fold symmetry in behavioral and simulated model performance, 3-fold symmetry in hippocampal activity, and 6-fold symmetry in entorhinal activity (all as a function of movement directions in conceptual space). The analyses seem thoroughly done, and the results and simulations are very interesting.

      We thank the reviewer for the positive assessment of our work.

      We thank both reviewers again for their constructive and insightful feedback, which has substantially strengthened the manuscript.

    1. Note de Synthèse : La Logique Émotionnelle chez l'Enfant

      Ce document de synthèse analyse les interventions de Catherine Aimelet-Perrisol, médecin et psychothérapeute, concernant la nature des émotions enfantines et la posture parentale requise pour les accompagner.

      Il repose sur l'approche de la « logique émotionnelle », qui s'éloigne d'une vision purement psychologique pour embrasser une compréhension biologique de l'émotion.

      Résumé Exécutif

      L’émotion ne doit pas être perçue comme un débordement à gérer ou à réprimer, mais comme un mouvement vital (e-movere) et un langage biologique signalant un besoin d'existence.

      Fondée sur les travaux du professeur Henri Laborit, cette approche postule que chaque émotion (peur, colère, tristesse, joie) répond à un code biologique précis visant la survie et l'affirmation de soi.

      Pour le parent, l'enjeu n'est pas de calmer l'enfant par la coercition, mais d'écouter ce que l'émotion dit de son besoin de sécurité, d'identité ou de sens.

      Le rôle éducatif évolue ainsi d'un cadre rigide vers une structure souple et une enveloppe sécurisante, permettant à l'enfant de transformer ses émotions en solutions adaptatives plutôt qu'en problèmes comportementaux.

      --------------------------------------------------------------------------------

      1. L’Émotion : Un Processus Biologique et Vital

      L'émotion est étymologiquement un « mouvement vers l'extérieur ». Loin d'être un simple phénomène psychologique, elle est une réaction cellulaire et neuronale ancrée dans le vivant.

      L’intention vitale : L'émotion manifeste l'élan vital de l'enfant. Lorsqu'un enfant crie ou s'agite, il exprime fondamentalement : « J'existe ».

      La rupture avec la « gestion » : Vouloir « gérer » ou contrôler les émotions est jugé contre-productif.

      L'émotion est un mécanisme de régime biologique qui s'impose à l'individu ; elle est donc « vraie » par définition, même si la réaction semble inadéquate aux yeux des adultes.

      Un langage à décrypter : L'émotion est le langage utilisé par l'enfant, souvent avant même la maîtrise des mots, pour dire quelque chose de sa propre existence et de son rapport au monde.

      --------------------------------------------------------------------------------

      2. Le Code Émotionnel : Les Quatre Catégories Fondamentales

      Selon la logique émotionnelle, chaque émotion est un signal spécifique répondant à un besoin précis. Le consensus identifie quatre grandes catégories :

      | Émotion | Besoin sous-jacent | Perception de la situation | Comportement associé | | --- | --- | --- | --- | | Peur | Sécurité | Danger perçu | Fuite ou évitement | | Colère | Identité / Estime de soi | Menace ou agression | Lutte ou confrontation | | Tristesse | Sens / Compréhension | Chaos ou privation de sens | Repli sur soi / Bulle de protection | | Joie | Expansion / Vitalité | Opportunité / Récompense | Externalisation / Explosion de vie |

      Focus sur les fonctions spécifiques :

      La Peur : Elle permet d'anticiper le pire pour s'y préparer. Elle devient une solution si le parent aide l'enfant à élaborer une stratégie face au danger ressenti.

      La Colère : Elle sert d'exutoire pour protéger le « moi ». L'enfant cherche à se faire entendre et à affirmer son identité dans la relation.

      La Tristesse : Elle crée une bulle de protection (souvent observée durant la période du COVID-19) face à un monde extérieur devenu incompréhensible.

      --------------------------------------------------------------------------------

      3. La Posture Parentale : Présence, Structure et Enveloppement

      Le parent est invité à passer d'un rôle de « sauveur » ou de « contrôleur » à celui d'accompagnateur.

      L'écoute et la restitution

      Au lieu d'évaluer le comportement, le parent doit s'intéresser au « comment » :

      Observation : Regarder comment l'enfant s'y prend pour dessiner ou apprendre (ex: une lune carrée n'est pas une erreur, mais une expression de ce que l'enfant a vu ou imaginé).

      Restitution : Redonner à l'enfant ses propres outils en lui montrant qu'on a perçu sa démarche (« Je vois que tu apprends mieux en marchant »). Cela renforce sa sécurité intérieure.

      Structure vs Cadre

      Le concept de « cadre » est souvent perçu comme restrictif ou source de conflit. On lui préfère deux autres notions :

      1. La Structure (ou Architecture) : Une colonne vertébrale à la fois souple et solide. C'est la « droiture » qui permet à l'enfant de s'élever et de découvrir ses propres règles.

      2. L'Enveloppement : Une protection nécessaire lorsque l'enfant est démuni ou traversé par un chagrin immense. C'est une présence qui dit : « Je suis là, je t'écoute ».

      L'Éducation comme Conduite

      L'éducation (ducere) consiste à apprendre à l'enfant comment « se conduire » plutôt que de lui imposer une conduite.

      Questionner un enfant sur la façon dont il compte se comporter dans une situation donnée stimule ses neurones et développe son sens de la responsabilité.

      --------------------------------------------------------------------------------

      4. Le Mystère du Développement et de l'Apprentissage

      Chaque enfant naît avec une « tonalité émotionnelle » singulière (plutôt inquiet, batailleur ou joyeux).

      L'influence de l'environnement : La culture familiale peut favoriser ou restreindre certaines émotions (ex : « chez nous, on ne pleure pas »).

      L'enfant s'adapte ou entre en résistance, ce qui constitue une part du mystère de sa personnalité.

      L'apprentissage comme chemin vers la sécurité : Il n'existe pas d'enfant qui ne veuille pas apprendre.

      Comprendre un concept ou réussir un apprentissage est une source majeure de sécurité intérieure.

      La loi commune : Si le « comment » (la méthode) est libre et appartient à l'enfant, le « quoi » (la nécessité d'apprendre la leçon, de respecter les règles sociales) relève de la loi et de l'ordre collectif, qui ne sont pas négociables.

      --------------------------------------------------------------------------------

      Conclusion : L’Émotion comme Solution

      L'approche de Catherine Aimelet-Perrisol conclut que l'émotion n'est jamais un problème en soi.

      Elle est une solution biologique que le corps trouve pour exprimer un besoin non satisfait.

      En validant le ressenti de l'enfant (« Ton corps dit vrai ») sans nécessairement valider toutes ses interprétations factuelles, le parent crée une relation « gagnant-gagnant » fondée sur la reconnaissance de l'existence de l'autre.

    1. Đây là một repository được tạo bởi Vercel Labs, chuyên về các quy tắc tốt nhất khi viết React, được tối ưu hóa để AI agents và LLMs có thể hiểu và áp dụng. Mục đích: Tạo ra một bộ hướng dẫn có cấu trúc, dễ đọc cho AI về cách viết React code hiệu quả, tập trung vào performance optimization.

      Cấu trúc thư mục ├── rules/ # Các file quy tắc riêng lẻ │ ├── _sections.md # Metadata của các phần │ ├── _template.md # Template tạo quy tắc mới │ └── area-description.md # Các file quy tắc cụ thể ├── src/ # Scripts build ├── metadata.json # Thông tin document ├── AGENTS.md # File output tổng hợp (tự động tạo) └── test-cases.json # Test cases cho LLM (tự động tạo)

    1. I like GitLab

      Summary: I Like GitLab

      • Initial Adoption: The author originally chose GitLab because it offered free private repositories when GitHub still charged for them, leading to a long-term workflow integration.
      • Integrated Container Registry: One of the most valued features is the built-in Docker registry, which eliminates the need for separate accounts, external access tokens, and concerns about Docker Hub pull limits.
      • CI/CD Maturity: GitLab's "config as code" (.gitlab-ci.yml) is praised for being versioned with the repo and offering extensive documentation, though the sheer volume of options can be overwhelming.
      • Runner Flexibility: While shared runners are reliable for free workloads, the author finds setting up custom runners on private VPS instances to be straightforward.
      • Performance Issues: The web interface is consistently described as sluggish and slow compared to GitHub, creating "constant friction" during long sessions.
      • Feature Bloat: GitLab attempts to be an all-in-one DevOps platform; while the author only uses about 10% of the features, they acknowledge the benefit of having advanced tools (like security scanning) available if needed.
      • Workflow Split: The author uses GitLab as a "digital workshop" for private, messy experiments and reserves GitHub for public-facing collaboration and visibility.

      Hacker News Discussion

      • Corporate Shift & Quality: Several users noted that since its IPO, GitLab seems to prioritize "enterprise checklist" features and AI over fixing long-standing bugs and improving general UI polish.
      • The "Sluggishness" Debate: A major point of discussion was GitLab's slow performance. Some attribute this to the "Ruby on Rails tax," though others pointed out that GitHub and Shopify also use Rails successfully, suggesting the issue lies in GitLab's specific architecture.
      • Rise of Alternatives: Many commenters mentioned switching to Forgejo or Gitea for self-hosting, citing significantly lower resource requirements (up to 90% less) and near-instant page loads.
      • The "80/20" Problem: Critics argued that GitLab often builds 80% of a feature to satisfy marketing requirements but leaves the remaining 20% of "polish" unfinished, leading to a "meme" of finding 5-year-old open bug reports for basic issues.
      • Storage Exploits: There was a technical side-discussion about the 10GB project limit; users noted that because the limit often applies per-layer rather than per-registry, it can sometimes be bypassed for very large images.
      • Website Appreciation: Many participants took a tangent to praise the blog's design, specifically its minimalist, terminal-like aesthetic and "markdown-as-markdown" presentation.
    1. The Code of Ethics is obligatory and disciplinary as well as aspirational and descriptive in that it defines the professional’s role. It is an integral educational resource regarding ethical principles and standards that are expected of audiologists, speech-language pathologists, and speech, language, and hearing scientists.

      To emphasize that the code of ethics is obligatory for both clinicians and researchers.

    1. We also may change how we behave and speak depending on the situation or who we are around, which is called code-switching.

      Ever since social media became popular, it’s been much harder to determine who’s genuine versus who puts on a show to conform to society. I think social media has become a breeding ground for loss of connection to self and persona.

    2. Since we have different personas and ways of behaving in different groups of people, what happens if different groups of people are observing you at the same time? For example, someone might not know how to behave if they were at a restaurant with their friends and they noticed that their parents were seated at the table next to them. This is phenomenon is called “context collapse.” On social media, context collapse is a common concern, since on a social networking site you might be connected to very different people (family, different groups of friends, co-workers, etc.). Additionally, something that was shared within one context (like a private message), might get reposted in another context (publicly posted elsewhere).

      Context collapse is a phenomenon which has ruined many peoples careers as a result. The importance of being able to code-switch allows for more interpersonal connections and often better outcomes. When others take advantage of these things, it really breaks some rules I believe.

    3. The way we present ourselves to others around us (our behavior, social role, etc.) is called our public persona. We also may change how we behave and speak depending on the situation or who we are around, which is called code-switching.

      Code switching happens, at least to me, unconsciously and it allows me to maintain healthy interactions and connections with people around me. I find it interesting that humans can adopt these different characters of themselves, it adds depth to who people really are.

    4. The way we present ourselves to others around us (our behavior, social role, etc.) is called our public persona. We also may change how we behave and speak depending on the situation or who we are around, which is called code-switching.

      As a person of color, I find myself code-switching relatively often. The culture I've been surrounded by growing up is incredibly different to others', and when I'm put in situations where I'm not talking with people who share a similar culture I tend to bottle-up and switch to a different version of myself. A version of myself that's more approachable and respectful, a bit more timid. I don't intentionally do so most of the time, it's sort of just turned into a habit for me, as I'm sure it has for other people of color.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (R1)

      R1 General statement: Here, Escalera-Maurer and colleagues, present an up-to-date distribution of homologues of Hok toxic proteins belonging to the well-annotated, but otherwise functionally obscure, hok/Sok type I toxin-antitoxin system, across the RefSeq database. Although such computational analyses have been done in the past, the authors here find many more hok homologs than described before, and they categorise their distribution based on whether they are encoded on chromosomes, plasmids, or (pro)phages. These computational analyses are in general tricky with T1TAs, as their toxins are quite short (~50 amino acids, as is the case for Hok), which is why the authors here used three separate approaches to expand their search (nucleotide-level BLAST, protein-homology, or both combined with Infernal). The authors cluster the Hok homologues they find based on a 60% sequence identity cut-off (expanding the known clusters in the process), and proceeded to test 31 candidates belonging to 15 sequence-clusters for their toxicity in Salmonella Typhimurium LT2, showing that 30/31 were toxic upon induction. An interesting finding from their endeavours is that hok/Sok homologues are enriched within prophages and large plasmids, but are not enriched near bacterial anti-phage defense systems (in contrast to the SymE/SymR T1TA). The findings suggest that hok/Sok are indeed sometimes linked to phage and plasmid biology, although they might not be antiphage defenses per se (they have been clearly shown in the past to be addiction modules, and this is still clearly true).

      Authors' answer to R1 General statement: __We do not state here that hok/Sok are not anti-phage defense systems, but we simply observe that they do not cluster with anti-phage defense systems. We have also observed (unpublished data) that known defense systems do not systematically cluster together with other defense systems. Therefore, strong association with other defense systems would have been a strong indication of their function in phage defense but the fact that we did not observe any association with defense systems does not exclude they are involved in phage defense. __

      R1_C1: My expertise lies towards the experimental side of the authors' work, I thus cannot comment on the accuracy/robustness of the computational analyses performed here. The authors do a fine job in clearly stating their findings overall; I could follow most of the conclusions, and I deemed that most of them were supported by their work. Additionally, I find that this paper is a missed opportunity to uncover even more novel biology connected to the interesting hok/Sok T1TAs. The paper does not provide a new framework to think about what is the function of the chromosomal/prophage hok/Sok T1TA systems, although I realize that this is very difficult to accomplish, especially when considering that hok/Sok systems have been around in the literature for almost 40 years.

      Authors' answer to R1_C1: We agree with the reviewer, as we indeed performed this analysis having in mind to clarify the role of hok/Sok systems. However, we still believe that our strong survey of Hok loci put in light their enrichment in various mobile genetic elements, such as prophage and large conjugative plasmids, which is indubitably linked to their function. In addition, our study will guide future experimental efforts in uncovering the function of these systems, for example by helping researchers to select relevant homologs to test for a specific function.__ __

      R1_C2: My major comment is in regard to the Hok toxicity assays (Fig. 2). The authors state in the discussion that "Hok peptides originating from chromosomes are as toxic as those from plasmids", but I believe that the way that they tested their constructs might not have allowed them to see toxicity differences between the two groups. Specifically, using the multi-copy plasmid pAZ3 (pBR322 origin of replication; ~15-20 plasmid copies per chromosome) to induce the different Hok toxin homologues in Salmonella Typhimurium LT2 with arabinose might have masked toxicity differences that would otherwise be apparent on the chromosomal expression-level.

      Some of the authors themselves have previously used the FASTBAC-Seq method to study the Hok homologue from plasmid R1, a useful technique during which a toxin is integrated in the chromosome, in order to study their toxicity under natural levels of expression. I believe that an ideal scenario would be to apply FASTBAC-seq to some of the 31 Hok homologues described here (e.g., a subset of plasmidic vs chromosomal Hok homologues) to shed light on potential toxicity differences between the Hok clusters. This would increase the value of the presented study.

      Alternatively, the authors could employ an L-arabinose concentration gradient to titrate the expression levels of the Hok toxins in order to potentially see different toxicity levels from the different homologues. However, this is not going to work in the system as they are using it now for two reasons:

      1. a) the S. Typhimurium LT2 (STm) used here has its arabinose utilization operon intact (araBAD), which means that Salmonella can catabolize arabinose to use it as a carbon source. This catabolization process interferes with the arabinose induction (i.e., Salmonella eats arabinose instead of using it as the Hok inducer). To ameliorate this, the authors could delete the araBAD operon in STm, rendering STm incapable of catabolizing arabinose, and repeat the experiments in that strain. Or use E. coli BW25113 as the expression host, which already has the araBAD operon deleted (it is not clear to me why the different Hok homologues would not be toxic in E. coli, as the different Hok homologues are widely diverse in sequence, as the authors found here).
      2. b) Even with the araBAD operon deleted, the arabinose induction would be bimodally on or off in the population, due to the bimodal expression of the arabinose transporter (AraE; see Khlebnikov et al., 2002). This would again not allow for titratable arabinose-inducible expression from different concentrations of arabinose. The solution for this would be to co-express a separate plasmid with araE, which would render every cell the same in regards to arabinose permeability, and thus the system would be titratable (as explained in Khlebnikov et al., 2002). Therefore, if the authors would be interested to go towards this route, they would have to first delete the araBAD from STm, then transform STm with an araE plasmid, and redo the experiments. In addition, I would propose to the authors to use the drop plate method (agar plate-based), which is more sensitive compared to the liquid assays employed here.

      Having said all that, I understand that all this experimental work would be strenuous and time-consuming, and although I would like to see it happen, this is not my paper. I would be content therefore if the authors toned down the claim that plasmidic vs chromosomal Hok homologues have the same toxicity, and discuss that chromosomal levels of toxicity are an important caveat that has not been explored here.

      __Authors' answer to R1_C2: __ We thank the reviewer for the detailed suggestion on how to better assess toxicity differences by using an araBAD deletion mutant overexpressing araE. We repeated the arabinose induction assays using drop assays and strain BW25223 with plasmid pJAT13araE and our pAZ3 based plasmid carrying Hok CDS homologs. However, we obtained similar data, not being able to distinguish between the toxicity of chromosomal versus plasmidic CDS, even using different concentration of Arabinose. This is probably because low concentration of the Hok protein are sufficient for activity, but here we are bypassing all post-transcriptional silencing by the native Hok mRNAs by expressing directly the protein, and we are using a multicopy plasmid. We now included 0.01% arabinose induction drop assays in the manuscript as the data obtained with other arabinose concentration did not provide new information. In any case, we are still not accessing the native expression levels for the following reasons 1/ chromosomal level of toxicity were not explored here and 2/ only the toxicity of the coding sequence but not the full mRNA was tested. Indeed, we do not know the exact sequence of the hok homolog mRNAs and this is beyond the scope of the study. These remarks were clearly added in the discussion.

      We agree that the sentence "Hok peptides originating from chromosomes are as toxic as those from plasmids" was too strong and we have added the caveats of our experimental design in the discussion. While we indeed did not compare the toxicity of the peptides, we still showed that chromosomal Hok can be toxic upon overexpression, which would not be the case if the sequences were degenerated.

      The reviewer also suggests the use of the FASTBAC-Seq method, that we previously used to study Hok from the R1 plasmid, which is a method to study toxic type I toxins at the native expression level. While FASTBAC-Seq identifies loss-of-function mutants of the systems, it does not allow to determine a difference of toxicity between systems per se. In addition, FASTBAC-Seq was always done in the context of the full mRNA, not only the coding sequence, and these sequences are presently unknown for most homologs.

      Other comments:

      __R1_C3: __a) There is barely any discussion of the Sok component (RNA antitoxin) of the homologues; why is that? Could you please discuss Sok differences across the homologues, or at least explain why this is not discussed at all in the paper (e.g., in the discussion)?

      Authors' answer to R1_C3: __It is not trivial to identify the Sok RNA sequence, this is why it was not done in this study, a paragraph was added in the discussion explaining this. __

      __R1_C4: __b) In the results section, the Hok clusters are referred to as 62 in number ("Because Hok sequences were too short and variable to construct a meaningful phylogenetic tree, we clustered the Hok sequences with a 60% identity threshold and obtained 62 clusters"), but then in the discussion section, the cluster number becomes 74 ("We highlighted the high sequence variability within Hok peptides by obtaining a total of 74 clusters with 60% identity (Fig. S7)."). Which one is the right number, and why is there a discrepancy?

      Authors' answer to R1_C4: We apologize for the discrepancy between the number. The first number corresponded to the Hok hits from the refSeq and we then added the Hok hits from the plasmid and virus databases (performed later in the manuscript). We clarified this information both in the result and discussion texts (61 clusters from RefSeq and 79 in total, 74 was a typo).__ __

      __R1 Significance: __The most well-clarified aspect of the paper presented here is the distribution of Hok homologues, with the novel aspect of the location in which the hok/Sok T1TAs reside (i.e., chromosome, plasmid, or phage). There is room for the molecular genetics part to be developed further, as I discussed earlier, however this study is the most up-to-date characterization of the diversity of Hok homologues, and will be of interest to the T1TA and the general toxin-antitoxin field.

      __Reviewer #2 (R2) __

      R2 General statement: The authors examined how the Hok toxins are spread across bacterial genomes. The manuscript including its figures is hard to read and understand. I commented figure 1 in details, but similar comments apply to the other figures. Overall, the data lack clarity and precision. Finding information about sequences, clusters in the supplementary materials was not easy. The manuscript should be thoroughly revised. In addition, I believe that other aspects should be developed to expand the interest of the study, such as the co-occurrence of multiple systems in chromosomes, on plasmids and whether they are able to crosstalk. This might provide some evolutionary insights into the biology of these toxins.

      __Authors' answer to R2 General statement: __We designed all figures according to established standards for scientific data visualization, although we recognize that different presentations may work better for different audiences. In our detailed response to Figure 1A, we explain how UpSet plots are constructed and interpreted, which we hope clarifies the visualization approach for the full dataset. We are open to discussing specific improvements if the reviewer has suggestions for enhanced clarity. To address concerns about accessibility, we want to clarify that all sequences are compiled in Table S1 with their clus100 identifiers, making them easy to locate. We are open to reorganizing supplementary materials if a different structure would be more user-friendly. Finally, we agree that an extensive analysis of co-occurrences and crosstalks would be valuable. However, predicting crosstalk bioinformatically for all genomes presents challenges, as it would require predicting RNA:RNA interactions between hok mRNA and Sok sequences, which are currently unknown. Given these limitations, this analysis was beyond the scope of the current study.

      R2_C1: The introduction lacks information regarding the Hok protein (size, structure prediction, localization) as well as a bit of explanation about the reason of looking at these toxins. The description of the potential roles should be a bit expanded.

      Authors' answer to R2_C1: Following the comment from the reviewer, we have provided additional information about Hok in the introduction.

      __R2_C2: __When the authors talk about 'loci', they mean genes encoding Hok homologs if I understand correctly. They did not look for the Sok sequences (hok-sok loci).

      __Author's answer to R2_C2: __Indeed, we did not look for the Sok sequences and we are only describing Hok homologs loci, that could either encode or lack a Sok homolog.

      __R2_C3: __It is not clear what the authors did with the sequences for which they could not detect a start codon and a SD (although it is unusual to refer to SD in the context of protein sequence)

      Authors' answer to R2_C3: The peptides were annotated by extending the initial hit until the first start codon. Therefore, all annotated peptides have a start codon. Shine-Dalgarno sequences were annotated when confidently predicted, to provide additional information. Sequences were not excluded based on the presence or absence of the SD.

      __R2_C4: __Figure 1A is not clear. The total of the bars equal 32,532 which is the number of 'loci' detected by the combination of the different methods. However, it is not clear to me how many are redundant. For instance, I suppose that all the 8483 sequences that were retrieved using blastn and Infernal were retrieved using MMseqs2, blastn and Infernal. So, what is the actual number of sequences that were found? When the authors talk about 1264 distinct peptides, what do they mean? What are the numbers on the X axis (18209, 2260, 27728)?

      Author's answer to R2_C4: Figure A1 is a very typical "UpSet" plot, as indicated in the legend (A. Lex, N. Gehlenborg, H. Strobelt, R. Vuillemot and H. Pfister, "UpSet: Visualization of Intersecting Sets," in IEEE Transactions on Visualization and Computer Graphics, vol. 20, no. 12, pp. 1983-1992, 31 Dec. 2014, doi: 10.1109/TVCG.2014.2346248). Those plots are a data visualization method for showing data with more than two intersecting sets. The Hok sequence hits were obtained by 3 different methods stated on the rows (MMseqs2, blastn and Infernal, therefore the number 18209 is the number of hits by the MMseqs2, 22680 the number of hits by blastn and 27728 the number of hits by Infernal). The columns show the intersections between these three sets. For example, the mentioned 8483 sequences (second column) were only found by blastn and Infernal but not by MMseqs2. The actual total number of sequences found is indeed 32 532. The 1264 distinct peptides are peptides with different sequences. After removing false positives, degenerated sequences and small peptides, we obtained 1264 unique Hok sequences that are found in the 32532 bacterial loci.

      __R2_C5: __About Infernal: first the authors are stating that only 8% of the sequences are lost when not considering the mRNA structure - which they seem to consider as negligeable. Then in the next section, they state that Infernal is the best tool at identifying clusters that are not detected otherwise. Seems a bit contradictory.

      __Authors' answer to R2_C5: __We appreciate the reviewer pointing out this apparent contradiction, we have clarified this part in the revised manuscript. Infernal uses both sequence and structure information simultaneously for homology detection. While only 8% of Infernal's hits are detected uniquely when structural information was considered, these sequences account for 9 additional clusters with notably high sequence diversity, which would otherwise have been undetected. Therefore, we believe that Infernal is the best tool to capture novel cluster diversity.

      __R2_C6: __Cluster determination. The threshold was put at 60% identity. What is the rationale for the 60% identity? Given that the Hok sequences (like toxins and antitoxins from TA systems in general) are highly variable, this leads to a high number of clusters. I'm not sure of the relevance of these clusters. Are there any other criteria to define clusters?

      Authors' answer to R2_C6: We selected 60% identity as a balance between capturing sequence diversity and generating interpretable results. We also tested 70, 80 and 90% and obtained 128, 221, 377 clusters, respectively, which would be too many for a meaningful visualization and interpretation. The best clustering method would be constructing a phylogenetic tree. However, as explained in the discussion, because the high sequence diversity prevented the construction of a reliable phylogenetic tree, clustering was used as an alternative strategy to identify and interpret patterns of sequence variability.

      __R2_C7: __The authors claim that most of the Hok diversity is found on chromosomes. However, the number of chromosomal Hok is higher than that located on plasmids, which might be related to the different sizes of the different replicons ie, chromosomes being larger than plasmids. Is there a way to normalize by determining the density per size?

      Authors' answer to R2_C7: We do not claim that chromosomes contain most of Hok diversity, as this would be indeed influenced by biases in the databases. We are just describing that we found most of the diversity in chromosomes, but we cannot conclude whether this is a true representation of the frequencies in nature.__ __

      R2_C8: '46 of the 62 clusters contained 10 or less distinct sequences and might be in the process of degenerating'. The authors also linked this with SD detection. Please explain. From what was indicated earlier, I understand that sequences with premature stop codons or short sequences (Authors' answer to R2_C8: We did not remove sequences for which we could not predict the SD. Indeed, lacking SD is a sign that the hok mRNA might not be able to play its biological role and would be indicative that the sequences have degenerated. To evaluate this hypothesis, we experimentally tested 5 sequences without a predicted SD and two of those were not toxic (see Table S2). In order to assess if the low abundant clusters contained degenerated sequences we experimentally tested representatives from some of the clusters with only one Hok CDS and found most of them to be toxic.

      R2_C9: 'Only 7.3% of the unique sequences were found on both plasmids and chromosomes'. From this observation, the authors conclude that 'there is little stable transfer from chromosomes to plasmids or vice-versa'. I don't understand what this means. Do they mean identical sequences? The fact that sequences differ from chromosomes to plasmids does not rule out 'stable transfer'. What do they actually mean by stable transfer? Once the gene is horizontally transferred, it is fixed and vertically transmitted? Same comments apply to the inter-genera horizontal transfer by plasmids.

      __Authors' answer to R2_C9: __Due to the impossibility of constructing a reliable phylogenetic tree, we used identity of sequences across different localizations or genera as our marker for recent, stable transfer events. We define stable transfer as the persistence of sequences in an unchanged form following horizontal transfer; long enough to be detected in current databases. Our approach likely underestimates total transfer events, as sequences accumulating mutations after transfer would not be captured. We would expect to observe numerous identical sequences across plasmids and chromosomes if frequent exchange were occurring, unless rapid mutation after the transfer prevented their detection as identical sequences. We have added a sentence to clarify this in the manuscript and removed the term stable transfer.

      __R2_C10: __I don't understand the next section about 'family'. What do the authors mean about 'family'? Genera? The same apply to the next section about the Y to C recoding. Did the authors do point mutations in the conserved amino acids/codons to test whether they are important for toxicity? Some Hok variants lacks some of the conserved amino acids and are toxic (under overexpression conditions in Salmonella). What about T18, C31 and E42?

      Authors' answer to R2_C10: Families (Enterobacteriaceae, Vibrionaceae etc... ) and genera (Escherichia, Salmonella etc...) refer to the taxonomic categories. Following the reviewer comment, we experimentally assessed the toxicity of Hok from R1 plasmid after mutating the conserved amino acids to alanine residues. All the mutants were found to be toxic under our expression conditions.

      __R2_C11: __The prevalence of Hok in chromosomes or on plasmids might depend on various confounding parameters, such as the size, number of sequences available among others. The authors should find methods to correct for all that.

      Authors' answer to R2_C11: Normalization would indeed be needed if we were comparing the prevalence on chromosomes vs the prevalence on plasmids. Here, we do not claim that Hok homologs are more prevalent in plasmid or chromosomes and only describe where we found them.

      __R2_C12: __Link with defense systems. The threshold was set at 20 kb. Why this threshold?

      Authors' answer to R2_C12: The size of defense islands in a previous report was approximately 40 kb, by setting up a 20 kb threshold we searched for defense systems in a region of 40 kb adjacent to each of the homologs (https://doi.org/10.1126/science.aar4120). If the specific homolog was part of a defense island we would expect that it is less than 20 kb apart from any defense system.

      __R2 Significance: __The paper in its current state appears to serve the role of a data repository rather than a thorough and original analysis. It requires extensive revisions before it can be of interest to experts in the toxin-antitoxin field.

      __ ____Reviewer #3 (R3): __

      R3 General statement: In the manuscript, "The Hok bacterial toxin: diversity, toxicity, distribution and genomic localization," by Escalera-Maurer et al., investigate the distribution of Hok type I toxin proteins across bacterial species. The Hok-Sok type I toxin-antitoxin system was first described on plasmids where it serves to maintain the plasmid in a population of bacterial cells: translation of the hok mRNA is prevented via the small antitoxin RNA Sok. Upon plasmid loss, with no new transcription of sok, the highly stable hok mRNA is translated into a small protein, killing the plasmid-less cell. Homologues to the system were identified in the chromosome of E. coli in the 1990s, and subsequent analyses have identified identical systems in other bacterial chromosomes, though they are close relatives to E. coli. Given the increased number of bacterial genomes sequenced, the group examined how widespread Hok may be across bacteria. They used a combination of BLASTn, MMseqs2 (protein) and Infernal (RNA) to identify, as best possible, all possible homologs. They then used sequence identity cut-offs to form Hok "clusters," and identified key features of the cluster as well as tested toxicity of overproduction of 31 homologs in a strain of Salmonella. Overall, though a variety of bioinformatic predictions and analyses, the manuscript identifies an expanded number of Hok members not previously identified and broaden the species it is found in, supported that Hok is not associate with defense systems, and provides additional support that horizontal transfer of hok genes is likely via plasmids (where hok is presumed to have originated).

      Major comments: There are some areas of the text that are a bit too definitive (these can be fixed or better explained in the text) and a few questions raised about the analyses and interpretations.

      Authors' answer to R3 Major Comment: As suggested by the reviewer, we rephrased parts of the manuscript.

      __These are the specific comments: __

      Introduction R3_C1: First paragraph: "Toxin production leads to the death of the cell encoding it" For many chromosomally encoded systems, toxicity has only been observed via artificial overexpression. This is an important point, as for many systems, a true biological function remains unknown. Further, add caveats regarding toxin function (for systems with validated function, they are involved in...). Again, there are still many questions for many t-at systems, in particular the Type I systems.

      __Authors' answer to R3_C1: __Indeed, the function of type 1 TA, in particular chromosomal ones, is still a matter of debate. While for hok/Sok R1, we previously showed death by expression at the chromosomal level, this was not shown for all TA (Le Rhun et al., NAR, 2023). We added that it could lead to the death or growth arrest of the cell instead and added the reviewer changes to for the function part.

      __R3_C2: __Introduction: type I's are more narrow in distribution, but much of this is due to their size and lack of biochemical domains. Again, please clarify more here.

      __Authors' answer to R3_C2: __We added the reviewer suggestion to the text.

      __R3_C3: __Introduction: while Hok's have been found on chromosomes, in E. coli strains, there is clear evidence that many are inactive. This comes up in the discussion, but it is worth including briefly in the introduction.

      Authors' answer to R3_C3: We have now added in the introduction that in the K12 laboratory strain, most chromosomal hok/Sok were found to be inactive.

      __R3_C4: __For the predicted transmembrane domain: it would be worth to include a box/indication as to where that is within the peptide (with the understanding it may not be exact). Is there more/less variation here? I'm assuming all clusters/family have a predicted TM domain?

      __Authors' answer to R3_C4: __When predicting the TM domain using DeepTMHMM - 1.0 prediction (https://services.healthtech.dtu.dk/services/DeepTMHMM-1.0/), 227 out of the 1264 unique Hok sequence are predicted to have a TM (transmembrane), 7 a SP (signal peptide) and a TM and 1025 have a SP. When predicting the TM of the consensus sequence (most abundant amino-acid) shown in Fig. 1D, region A8 to L25 is predicted to be inserted in the membrane, with the Nterm inside and Cterm outside.

      __R3_C5: __What is the cutoff for being a Hok? Did they take the "last hit" and use that in additional searches to see if more appeared? If that was done, and the search was exhaustive, this really important to add for the reader.

      Authors' answer to R3_C5: The MMseqs2 search was performed using 5 iterations as indicated in the M&M, meaning that the hits of the one search were used to search the database again five time in a raw. Importantly, an attempt to increase the number of iterations to 10 did not significantly increase the number of hits. Therefore, at least for the MMseqs2 search in the RefSeq database, we are close to being exhaustive.

      __R3_C6: __Figure S4: the authors state that there was no difference in the degree of toxicity between the clusters. There do appear to be some peptides tested that at the arabinose concentration used did not repress growth as immediately as others. If higher arabinose concentration is used, does that eliminate these differences? OR are many of these suppressors-if diluted back again, do they grow as if they are non-toxic in arabinose?

      Authors' answer to R3_C6: As suggested by Reviewer 1 (R1_C2), we performed titration of arabinose in a system overexpressing araE in a ΔaraBAD but were not able to find difference of toxicity in our conditions, see also our answer to R1_C2.

      __R3_C7: __Discussion: "because non-functional homologs are expected to quickly accumulate mutations..." is a bit problematic. Hok is highly regulated-as are some of the other well-described type I toxins. In MG1655, while the coding sequence may be intact, there are other mutations and/or insertion elements that prevent expression (and be extension, function. Given the lack of consensus data for type Is, it is best to provide more context for this. If the authors wish to argue that they should quickly accumulate mutations, it would be good to provide additional rates/evidence (even for other loci) from the Enterobacteriaceae.

      __Authors' answer to R3_C7: __We agree this statement might need to be supported further. We have removed this sentence to address this concern.

      __Minor comments: __

      __R3_C8: __For the sequences used in the search: please provide the sequence used in addition to the reference to the T1TAdb. Was the full-length hok mRNA, including mok, used? Please provide the nucleic acid sequence (and include description of whether full-length, etc.) in Materials and Methods or in Supplemental.

      __Authors' answer to R3_C8: __Sequences and code were deposited on https://gitub.u-bordeaux.fr/alerhun/Escalera-Maurer_2025. This files named curated_Hok.fasta and hok.fa, corresponding to Hok protein and mRNA sequences respectively are available in the file "T1TAdb input".

      __R3_C9: __60% identity was used for clustering. Did this become a problem-meaning separation of same property amino acid?

      __Authors' answer to R3_C9: __We checked amino acid signatures for each cluster (Fig S2), but could not find anything relevant.

      __R3_C10: __Fig. S2: for the clusters shown, please add in HokB, HokE, etc., to better correspond to Figure 1 in the main text.

      __Authors' answer to R3_C10: __The clusters were annotated according to the suggestion.

      __R3_C11: __Fig S1: this figure is challenging to orient-what are the numbers (8_10_85)?

      Authors' answer to R3_C11: The figure was generated using the CLANS tool, with each unique sequence retrieved by our analysis shown as a dot. Hok homologous sequences are in red and cluster together, the outlier clusters are annotated with the numbers corresponding to their 60% identity cluster. We understand that separating the number using an underscore could lead to confusion, therefore we have now separated the numbers using a coma.

      __R3_C12: __Please make a separate table or sheet for the experimentally tested peptides. Table S1 is quite large and a separate table/sheet would make this easier to find. If possible, please give the files names a more descriptive title (Table S1 in the name for example). This may be an issue with Review Commons but the individual file names were non-descript and the descriptions on the webpage did not indicate what the file contained.

      __Authors' answer to R3_C12: __We named the files Table S1 and File_S1 to S7. We added a table S2 with the experimentally tested peptides. Note that identical peptides can be sometime found in several bacterial loci.

      __R3_C13: __Figure S9: the black arrow for Hok is hard to see-it appears that the long grey bar going through multiple loci is indicative of Hok. Perhaps label this differently to make it easier on the reader (the line initially seemed to be a formatting issue and not indicative of the position of Hok.

      __Authors' answer to R3_C13: __We have now added a new label to indicate where is Hok, and clarified it in the figure legend.

      __R3_C14: __While the authors focused on Hok for this approach, which is fine and appropriate, can they comment at all about where mok is there in these new clusters/sub-families? Sok potential?

      __Authors' answer to R3_C14: __We added a paragraph about Mok in the discussion.

      __R3 Significance: __Overall the paper is a sound bioinformatic exercise and is improved with the testing of numerous "new" Hok proteins. Most of the comments can be done with some clarifications and maybe some additional analyses and/or verification which should take minimal time. The authors are over-emphatic at points as indicated and need to be more careful and precise with their language.

      In terms of advancement, it advances the distribution of these systems and adds to the depth of sub-classes. The audience will be more specialized to those who study these systems.

      Expertise: I have been studying type I toxin-antitoxin systems since the mid-2000s. We published a study examining (and mentioned well by this article!) the distribution in chromosomes of type I toxin-antitoxin systems, identified brand-new systems (that were chromosomally-limited at the time). My lab has continued to study regulation of type I toxins and distribution of chromosomally-only-encoded systems (so not Hok).

    1. What I'd really like to see is some kind of iframe that pins JS/wasm code within it to a particular bundle hash and prevents modification at runtime (even from chrome extensions).Something more like a TEE inside the browser of sorts.

      So you want people to let you run code on their machine that makes it answer to you—some random nobody—instead of the person who is using, and very likely owns and paid for, the device in question.

      Perhaps you would next like to see your neighbor to just give you their car and convince some local businesses to let you take over their employees, shops, cash registers, and other equipment to put them work for you as well.

    1. blogger Fabrizio Ferri Benedetti on their 4 modes of using AI in technical writing. - watercooler conversations, to get code explained - text suggestions while writing/coding (esp for repeating patterns in your work - providing context / constraints / intent to generate first drafts, restructure content, or boilerplate commentary etc. - a robotic assembly line, to do checks, tests and rewrites. MCP/skills involved.

      Not either/or but switching between modes

    1. In 2016, when Donald Trump was running a campaign to be the US President, one twitter user pointed out that you could see which of the Tweets on Donald Trump’s Twitter account were posted from an Android phone and which from an iPhone, and that the tone was very different. A data scientist decided to look into it more and found: “My analysis … concludes that the Android and iPhone tweets are clearly from different people, “posting during different times of day and using hashtags, links, and retweets in distinct ways, “What’s more, we can see that the Android tweets are angrier and more negative, while the iPhone tweets tend to be benign announcements and pictures. …. this lets us tell the difference between the campaign’s tweets (iPhone) and Trump’s own (Android).” (Read more in this article from The Guardian) Note: we can no longer run code to check this ourselves because first, Donald Trump’s account was suspended in January 2021 for inciting violence, then when Elon Musk decided to reinstate Donald Trump’s account (using a Twitter poll as an excuse, but how many of the votes were bots?), Elon Musk also decided to remove the ability to look up a tweet’s source.

      This analysis intrigued me, and it was the first time I realized that a data scientist could reasonably infer which tweets were likely to have come from Trump himself and which from his campaign. This shows that even seemingly simple metadata can contain very strong behavioral signals. This made me realize that platforms are not neutral technological Spaces, but systems that are influenced by power, economic interests, and individual decisions.

    1. With your right hand, you physically hit Control + S on the keyboard.

      Haptic Grounding. The act of physically pressing the keys is a "Neural Anchor." It tells your brain: The data is secure. It will not be lost. This satisfies the threat-detection system. By mentally tagging the code "Logic Saved," you give yourself permission to fully engage your "Social Engagement System" (the yellow light). You cannot listen while you are afraid of losing your place in the code.

    1. a "failed" trace from production, convert it into a test case with one click, and run it against new code to ensure the bug doesn't reappear.

      This is that flywheel moat of integrating errors + convert the better answer to a golden dataset along with helping non-technical users interact with technical ones

    1. a "failed" trace from production, convert it into a test case with one click, and run it against new code to ensure the bug doesn't reappear.

      This is that flywheel moat of integrating errors + convert the better answer to a golden dataset along with helping non-technical users interact with technical ones

    1. The ecosystem of asynchronous coding agents is rapidly evolving, with each offering different integration points and capabilities:GitHub Copilot Agent: Accessible through GitHub by assigning issues to the Copilot user, with additional VS Code integrationCodex: OpenAI's hosted coding agent, available through their platform and accessible from ChatGPTOpenHands: Open-source agent available through the All Hands web app or self-hosted deploymentsJules: Google Labs product with GitHub integration capabilitiesDevin: The pioneering coding agent from Cognition that first demonstrated this paradigmCursor background agents: Embedded directly in the Cursor IDECI/CD integrations: Many command-line tools can function as asynchronous agents when integrated into GitHub Actions or continuous integration scripts

      A list of async coding agents in #2025/08 github, openai, google mentioned. OpenHands is the one open source mentioned. mentions that command line tools can be used (if integrated w e.g. github actions to tie into the coding environment) - [ ] check out openhands agent by All Hands

    1. Further ReadingI’m not gonna pretend to be an expert here (any more than I’m an expert Obsidian plugin developer :p) but here are some resources that helped me figure out Claude CodeKent writes a lot about how he uses Obsidian with Claude Code.This is an incredible hub of resources for using Claude Code for project management, by someone who also uses Obsidian.This take on Claude Code for non-developers helped solidify my understanding of how it all works; it hallucinates less, for one thing.Eleanor Berger has fantastic tips for working with asynchronous coding agents and is incredibly level-headed about the LLM landscape.This article does a great job of breaking down all the nitty-gritty of how Claude Code works.Damian Player has a step-by-step guide on using Claude Code as a non-technical person that goes into more depth.Here’s a tutorial from a pro that breaks down best practices for using Claude Code, like the importance of planning and thinking things through, and exactly why a good CLAUDE.md file matters.

      Links w further reading wrt Claude Code and Obsidian. Most of these are links to X. Ugh.

    2. Little Tips for Claude Code + Obsidian

      Some tips on her usage of Claude Code. - Put all your work in a folder next to the obsidian folder - to treat skills and commands like functions. Don't ever repeat them. - Install and use git locally to have a commit history. - On each step that you need to correct Claude code, tell it to write down directions or rules to avoid a mistake in the future. - circumvent public API liimits by changing the query slightly, or hit it in parallel

    3. Setting Claude Code Up in ObsidianI was genuinely surprised at how easy the terminal plugin was to install for Obsidian. In Obsidian, I went to community plugins, searched for “terminal,” and installed the Terminal plugin by polyipseity. Then I clicked the “open terminal” button on the left-hand side. That’s it.There’s a dedicated Claudian plugin (subtly different from the Claudsidian solution people), but the Terminal felt a little higher fidelity to how I’m used to doing things, and a little simpler to understand. Plus, Claudian looks great but honestly I don’t think I can live without plan mode, which the readme says it doesn’t currently support. Plan mode is nice because it asks questions, really thinks things through, and can be trusted not to do dumb destructive things.

      There is a terminal plugin for Obsidian that you can connect to Claude Code (apparently). She advices against the Claudian plugin bc it lacks plan mode (i.e. not immediately act)

    4. If you have been following along with me for years you know I don’t hype things just because people are hyping things. But Claude Code finally has made AI a core part of my processes instead of just a thing I use sometimes as an extra source or bonus spell checker or quicker way to reformat files.

      She feels Claude Code is now a core tool in her workflows

    5. The UI feels so intuitive, like an old-school MUD.

      UI? Are we still talking about the terminal? Ah no, she means the desktop version, see [[Claude Code for VSCode - Visual Studio Marketplace]] for the VScode plugin as well.

    1. Sometimes in programming, we want to group several steps (i.e., statements) together. When we group these steps together we call it a code “block.” These blocks of code often used with conditionals (e.g., if this condition is true, do these five steps), and with loops (e.g., for each of these items, do these five steps).

      Explaining code blocks this way helps clarify how automated actions are grouped and repeated in bot behavior. When blocks are combined with conditionals and loops, it becomes clear how a single decision rule can lead to large-scale repeated actions, which is especially relevant when considering how bots can amplify content or behavior across platforms.

    2. In order to understand how a bot is built and can work, we will now look at the different ways computer programs can be organized. We will cover a bunch of examples quickly here, to hopefully give you an idea of many options for how to write a program. Don’t worry if you don’t follow all of it, as we will go back over these one at a time in more detail throughout the book. In this section, we will not show actual Python computer programs (that will be in the next section). Instead, here we will focus on what programmers call “psuedocode,” which is a human language outline of a program. Psuedocode is intended to be easier to read and write. Pseudocode is often used by programmers to plan how they want their programs to work, and once the programmer is somewhat confident in their pseudocode, they will then try to write it in actual programming language code.

      This explanation of pseudocode is helpful because it lowers the barrier to understanding how bots are structured without requiring prior programming knowledge. Framing pseudocode as a planning and thinking tool emphasizes that building bots is not just a technical process, but also a conceptual one where ethical choices can be made early, before code is even written.

    1. The code-based method is the most deterministic and objectiveapproach [ 8 , 53, 57]. [8] It relies on explicit rules, test cases, orassertions to verify whether an agent’s response meets predefinedcriteria. This method is particularly effective for tasks with well-defined outputs, such as numerical calculations, structured querygeneration, or syntactic correctness in programming tasks

      query genrartion review

    1. You must provide information on your Facility Clearance Level (FCL) using the provided template (Attachment L-8). This includes company name, address, CAGE code, and FCL level. If your company is a joint venture, the FCL requirement may also apply to individual partners per 13 C.F.R. 121.103(h)(4).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      The technical approach is strong and the conceptual framing is compelling, but several aspects of the evidence remain incomplete. In particular, it is unclear whether the reported changes in connectivity truly capture causal influences, as the rank metrics remain correlational and show discrepancies with the manipulation results.

      We agree that our functional connectivity ranking analyses cannot establish causal influences. As discussed in the manuscript, besides learning-related activity changes, the functional connectivity may also be influenced by neuromodulatory systems and internal state fluctuations. In addition, the spatial scope of our recordings is still limited compared to the full network implicated in visual discrimination learning, which may bias the ranking estimates. In future, we aim to achieve broader region coverage and integrate multiple complementary analyses to address the causal contribution of each region.

      The absolute response onset latencies also appear slow for sensory-guided behavior in mice, and it is not clear whether this reflects the method used to define onset timing or factors such as task structure or internal state.

      We believe this may be primarily due to our conservative definition of onset timing. Specifically, we required the firing rate to exceed baseline (t-test, p < 0.05) for at least 3 consecutive 25-ms time windows. This might lead to later estimates than other studies, such as using the latency to the first spike after visual stimulus onset (Siegle et al., 2021) or the time to half-max response (Goldbach, Akitake, Leedy, & Histed, 2021).

      The estimation of response onset latency in our study may also be affected by potential internal state fluctuations of the mice. We used the time before visual stimulus onset as baseline firing, since firing rates in this period could be affected by trial history, we acknowledge this may increase the variability of the baseline, thus increase the difficulty to statistically detect the onset of response.

      Still, we believe these concerns do not affect the observation of the formation of compressed activity sequence in CR trials during learning.

      Furthermore, the small number of animals, combined with extensive repeated measures, raises questions about statistical independence and how multiple comparisons were controlled.

      We agree that a larger sample size would strengthen the robustness of the findings. However, as noted above, the current dataset has inherent limitations in both the number of recorded regions and the behavioral paradigm. Given the considerable effort required to achieve sufficient unit yields across all targeted regions, we wish to adjust the set of recorded regions, improve behavioral task design, and implement better analyses in future studies. This will allow us to both increase the number of animals and extract more precise insights into mesoscale dynamics during learning.

      The optogenetic experiments, while intended to test the functional relevance of rank increasing regions, leave it unclear how effectively the targeted circuits were silenced. Without direct evidence of reliable local inhibition, the behavioral effects or lack thereof are difficult to interpret.

      We appreciate this important point. Due to the design of the flexible electrodes and the implantation procedure, bilateral co-implantation of both electrodes and optical fibers was challenging, which prevented us from directly validating the inhibition effect in the same animals used for behavior. In hindsight, we could have conducted parallel validations using conventional electrodes, and we will incorporate such controls in future work to provide direct evidence of manipulation efficacy.

      Details on spike sorting are limited.

      We have provided more details on spike sorting in method section, including the exact parameters used in the automated sorting algorithm and the subsequent manual curation criteria.

      Reviewer #2 (Public review):

      Weaknesses:

      I had several major concerns:

      (1) The number of mice was small for the ephys recordings. Although the authors start with 7 mice in Figure 1, they then reduce to 5 in panel F. And in their main analysis, they minimize their analysis to 6/7 sessions from 3 mice only. I couldn't find a rationale for this reduction, but in the methods they do mention that 2 mice were used for fruitless training, which I found no mention in the results. Moreover, in the early case, all of the analysis is from 118 CR trials taken from 3 mice. In general, this is a rather low number of mice and trial numbers. I think it is quite essential to add more mice.

      We apologize for the confusion. As described in the Methods section, 7 mice (Figure 1B) were used for behavioral training without electrode array or optical fiber implants to establish learning curves, and an additional 5 mice underwent electrophysiological recordings (3 for visual-based decision-making learning and 2 for fruitless learning).

      As we noted in our response to Reviewer #1, the current dataset has inherent limitations in both the number of recorded regions and the behavioral paradigm. Given the considerable effort required to achieve high-quality unit yields across all targeted regions, we wish to adjust the set of recorded regions, improve behavioral task design, and implement better analyses in future studies. These improvements will enable us to collect data from a larger sample size and extract more precise insights into mesoscale dynamics during learning.

      (2) Movement analysis was not sufficient. Mice learning a go/no-go task establish a movement strategy that is developed throughout learning and is also biased towards Hit trials. There is an analysis of movement in Figure S4, but this is rather superficial. I was not even sure that the 3 mice in Figure S4 are the same 3 mice in the main figure. There should be also an analysis of movement as a function of time to see differences. Also for Hits and FAs. I give some more details below. In general, most of the results can be explained by the fact that as mice gain expertise, they move more (also in CR during specific times) which leads to more activation in frontal cortex and more coordination with visual areas. More needs to be done in terms of analysis, or at least a mention of this in the text.

      Due to the limitation in the experimental design and implementation, movement tracking was not performed during the electrophysiological recordings, and the 3 mice shown in Figure S4 (now S5) were from a separate group. We have carefully examined the temporal profiles of mouse movements and found it did not fully match the rank dynamics for all regions, and we have added these results and related discussion in the revised manuscript. However, we acknowledge the observed motion energy pattern could explain some of the functional connection dynamics, such as the decrease in face and pupil motion energy could explain the reduction in ranks for striatum.

      Without synchronized movement recordings in the main dataset, we cannot fully disentangle movement-related neural activity from task-related signals. We have made this limitation explicit in the revised manuscript and discuss it as a potential confound, along with possible approaches to address it in future work.

      (3) Most of the figures are over-detailed, and it is hard to understand the take-home message. Although the text is written succinctly and rather short, the figures are mostly overwhelming, especially Figures 4-7. For example, Figure 4 presents 24 brain plots! For rank input and output rank during early and late stim and response periods, for early and expert and their difference. All in the same colormap. No significance shown at all. The Δrank maps for all cases look essentially identical across conditions. The division into early and late time periods is not properly justified. But the main take home message is positive Δrank in OFC, V2M, V1 and negative Δrank in ThalMD and Str. In my opinion, one trio map is enough, and the rest could be bumped to the Supplementary section, if at all. In general, the figure in several cases do not convey the main take home messages. See more details below.

      We thank the reviewer for this valuable critique. The statistical significance corresponding to the brain plots (Figure 4 and Figure 5) was presented in Figure S3 and S5 (now Figure S5 and S7 in the revised manuscript), but we agree that the figure can be simplified to focus on the key results.

      In the revised manuscript, we have condensed these figures to focus on the most important comparisons to make the visual presentation more concise and the take-home message clearer.

      (4) The analysis is sometimes not intuitive enough. For example, the rank analysis of input and output rank seemed a bit over complex. Figure 3 was hard to follow (although a lot of effort was made by the authors to make it clearer). Was there any difference between the output and input analysis? Also, the time period seems redundant sometimes. Also, there are other network analysis that can be done which are a bit more intuitive. The use of rank within the 10 areas was not the most intuitive. Even a dimensionality reduction along with clustering can be used as an alternative. In my opinion, I don't think the authors should completely redo their analysis, but maybe mention the fact that other analyses exist

      We appreciate the reviewer’s comment. In brief, the input- and output-rank analyses yielded largely similar patterns across regions in CR trials, although some differences were observed in certain areas (e.g., striatum) in Hit trials, where the magnitude of rank change was not identical between input and output measures. We have condensed the figures to only show averaged rank results, and the colormap was updated to better covey the message.

      We did explore dimensionality reduction applied to the ranking data. However, the results were not intuitive as well and required additional interpretation, which did not bring more insights. Still, we acknowledge that other analysis approaches might provide complementary insights.

      Reviewer #3 (Public review):

      Weaknesses:

      The weakness is also related to the strength provided by the method. It is demonstrated in the original method that this approach in principle can track individual units for four months (Luan et al, 2017). The authors have not showed chronically tracked neurons across learning. Without demonstrating that and taking advantage of analyzing chronically tracked neurons, this approach is not different from acute recording across multiple days during learning. Many studies have achieved acute recording across learning using similar tasks. These studies have recorded units from a few brain areas or even across brain-wide areas.

      We appreciate the reviewer’s important point. We did attempt to track the same neurons across learning in this project. However, due to the limited number of electrodes implanted in each brain region, the number of chronically tracked neurons in each region was insufficient to support statistically robust analyses. Concentrating probes in fewer regions would allow us to obtain enough units tracked across learning in future studies to fully exploit the advantages of this method.

      Another weakness is that major results are based on analyses of functional connectivity that is calculated using the cross-correlation score of spiking activity (TSPE algorithm). Functional connection strengthen across areas is then ranked 1-10 based on relative strength. Without ground truth data, it is hard to judge the underlying caveats. I'd strongly advise the authors to use complementary methods to verify the functional connectivity and to evaluate the mesoscale change in subnetworks. Perhaps the authors can use one key information of anatomy, i.e. the cortex projects to the striatum, while the striatum does not directly affect other brain structures recorded in this manuscript

      We agree that the functional connectivity measured in this study relies on statistical correlations rather than direct anatomical connections. We plan to test the functional connection data with shorter cross-correlation delay criteria to see whether the results are consistent with anatomical connections and whether the original findings still hold.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The small number of mice, each contributing many sessions, complicates the  interpretation of the data. It is unclear how statistical analyses accounted for the small  sample size, repeated measures, and non-independence across sessions, or whether  multiple comparisons were adequately controlled.

      We realized the limitation from the small number of animal subjects, yet the difficulty to achieve sufficient unit yields across all regions in the same animal restricted our sample size. Though we agree that a larger sample size would strengthen the robustness of the findings, however, as noted below the current dataset has inherent limitations in both the scope of recorded regions and the behavioral paradigm.

      Given the considerable effort required to achieve sufficient unit yields across all targeted regions, we wish to adjust the set of recorded regions, improve behavioral task design, and implement better analyses in future studies. This will allow us to both increase the number of animals and extract more precise insights into mesoscale dynamics during learning.

      (2) The ranking approach, although intuitive for visualizing relative changes in  connectivity, is fundamentally descriptive and does not reflect the magnitude or  reliability of the connections. Converting raw measures into ordinal ranks may obscure  meaningful differences in strength and can inflate apparent effects when the underlying  signal is weak.

      We agree with this important point. As stated in the manuscript, our motivation in taking the ranking approach was that the differences in firing rates might bias cross-correlation between spike trains, making raw accounts of significant neuron pairs difficult to compare across conditions, but we acknowledge the ranking measures might obscure meaningful differences or inflate weak effects in the data.

      We added the limitations of ranking approach in the discussion section and emphasized the necessity in future studies for better analysis approaches that could provide more accurate assessment of functional connection dynamics without bias from firing rates.

      (3) The absolute response onset latencies also appear quite slow for sensory-guided  behavior in mice, and it remains unclear whether this reflects the method used to  determine onset timing or factors such as task design, sensorimotor demands, or  internal state. The approach for estimating onset latency by comparing firing rates in  short windows to baseline using a t-test raises concerns about robustness, as it may  be sensitive to trial-to-trial variability and yield spurious detections.

      We agree this may be primarily due to our conservative definition of onset timing. Specifically, we required the firing rate to exceed baseline (t-test, p < 0.05) for at least 3 consecutive 25-ms time windows. This might lead to later estimates than other studies, such as using the latency to the first spike after visual stimulus onset (Siegle et al., 2021) or the time to half-max response (Goldbach, Akitake, Leedy, & Histed, 2021).

      The estimation of response onset latency in our study may also be affected by potential internal state fluctuations of the mice. We used the time before visual stimulus onset as baseline firing, since firing rates in this period could be affected by trial history, we acknowledge this may increase the variability of the baseline, thus increase the difficulty to statistically detect the onset of response.

      Still, we believe these concerns do not affect the observation of the formation of compressed activity sequence in CR trials during learning.

      (4) Details on spike sorting are very limited. For example, defining single units only by  an interspike interval threshold above one millisecond may not sufficiently rule out  contamination or overlapping clusters. How exactly were neurons tracked across days  (Figure 7B)?

      We have added more details on spike sorting, including the processing steps and important parameters used in the automated sorting algorithm. Only the clusters well isolated in feature space were accepted in manual curation.

      We attempted to track the same neurons across learning in this project. However, due to the limited number of electrodes implanted in each brain region, the number of chronically tracked neurons in each region was insufficient to support statistically robust analyses.

      This is now stated more clearly in the discussion section.

      (5) The optogenetic experiments, while designed to test the functional relevance of  rank-increasing regions, also raise questions. The physiological impact of the inhibition  is not characterized, making it unclear how effectively the targeted circuits were  actually silenced. Without clearer evidence that the manipulations reliably altered local  activity, the interpretation of the observed or absent behavioral effects remains  uncertain.

      We appreciate this important point. Due to the design of the flexible electrodes and the implantation procedure, bilateral co-implantation of both electrodes and optical fibers was challenging, which prevented us from directly validating the inhibition effect in the same animals used for behavior. In hindsight, we could have conducted parallel validations using conventional electrodes, and we will incorporate such controls in future work to provide direct evidence of manipulation efficacy. 

      (6) The task itself is relatively simple, and the anatomical coverage does not include  midbrain or cerebellar regions, limiting how broadly the findings can be generalized to more flexible or ethologically relevant forms of decision-making.

      We appreciate this advice and have expanded the existing discussion to more explicitly state that the relatively simple task design and anatomical coverage might limit the generalizability of our findings.

      (7) The abstract would benefit from more consistent use of tense, as the current mix of  past and present can make the main findings harder to follow. In addition, terms like  "mesoscale network," "subnetwork," and "functional motif" are used interchangeably in  places; adopting clearer, consistent terminology would improve readability.

      We have changed several verbs in abstract to past form, and we now adopted a more consistent terminology by substituting “functional motif” as “subnetwork”. We still feel the use of

      “mesoscale network” and “subnetwork” could emphasize different aspects of the results according to the context, so these words are kept the same.

      (8) The discussion could better acknowledge that the observed network changes may  not reflect task-specific learning alone but could also arise from broader shifts in  arousal, attention, or motivation over repeated sessions.

      We have expanded the existing discussion to better acknowledge the possible effects from broader shifts in arousal, attention, or motivation over repeated sessions.

      (9) The figures would also benefit from clearer presentation, as several are dense and  not straightforward to interpret. For example, Figure S8 could be organized more  clearly to highlight the key comparisons and main message

      We have simplified the over-detailed brain plots in Figure 4-5, and the plots in Figure 6 and S8 (now S10 in the revised manuscript).

      (10) Finally, while the manuscript notes that data and code are available upon request,  it would strengthen the study's transparency and reproducibility to provide open access  through a public repository, in line with best practices in the field.

      The spiking data, behavior data and codes for the core analyses in the manuscript are now shared in pubic repository (Dryad). And we have changed the description in the Data Availability secition accordingly.

      Reviewer #2 (Recommendations for the authors):

      (A) Introduction:

      (1) "Previous studies have implicated multiple cortical and subcortical regions in visual  task learning and decision-making". No references here, and also in the next sentence.

      The references were in the following introduction and we have added those references here as well.

      We also added one review on cortical-subcortical neural correlates in goal-directed behavior (Cruz et al., 2023).

      (2) Intro: In general, the citation of previous literature is rather minimal, too minimal.  There is a lot of studies using large scale recordings during learning, not necessarily  visual tasks. An example for brain-wide learning study in subcortical areas is Sych et  al. 2022 (cell reports). And for wide-field imaging there are several papers from the  Helmchen lab and Komiyama labs, also for multi-area cortical imaging.

      We appreciate this advice. We included mainly visual task learning literature to keep a more focused scope around the regions and task we actually explored in this study. We fear if we expand the intro to include all the large-scale imaging/recording studies in learning field, the background part might become too broad.

      We have included (Sych, Fomins, Novelli, & Helmchen, 2022) for its relevance and importance in the field.

      (3) In the intro, there is only a mention of a recording of 10 brain regions, with no  mention of which areas, along with their relevance to learning. This is mentioned in the  results, but it will be good in the intro.

      The area names are now added in intro.

      (B) Results:

      (1) Were you able to track the same neurons across the learning profile? This is not  stated clearly.

      We did attempt to track the same neurons across learning in this project. However, due to the limited number of electrodes implanted in each brain region, the number of chronically tracked neurons in each region was insufficient to support statistically robust analyses.

      We now stated this more clearly in the discussion section.

      (2) Figure 1 starts with 7 mice, but only 5 mice are in the last panel. Later it goes down  to 3 mice. This should be explained in the results and justified.

      We apologize for the confusion. As described in the Methods section, 7 mice (Figure 1B) were used for behavioral training without electrode array or optical fiber implants to establish learning curves, and an additional 5 mice underwent electrophysiological recordings (3 for visual-based decision-making learning and 2 for fruitless learning).

      (3) I can't see the electrode tracks in Figure 1d. If they are flexible, how can you make  sure they did not bend during insertion? I couldn't find a description of this in the  methods also.

      The electrode shanks were ultra-thin (1-1.5 µm) and it was usually difficult to recover observable tracks or electrodes in section.

      The ultra-flexible probes could not penetrate brain on their own (since they are flexible), and had to be shuttled to position by tungsten wires through holes designed at the tip of array shanks. The tungsten wires were assembled to the electrode array before implantation; this was described in the section of electrode array fabrication and assembly. We also included the description about the retraction of the guiding tungsten wires in the surgery section to avoid confusion.

      As an further attempt to verify the accuracy of implantation depth, we also measured the repeatability of implantation in a group of mice and found a tendency for the arrays to end in slightly deeper location in cortex (142.1 ± 55.2 μm, n = 7 shanks), and slightly shallower location in subcortical structure (-122.6 ± 71.7 μm, n = 7 shanks). We added these results as new Figure S1 to accompany Figure 1.

      (4) In the spike rater in 1E, there seems to be ~20 cells in V2L, for example, but in 1F,  the number of neurons doesn't go below 40. What is the difference here? 

      We checked Figure 1F, the plotted dots do go below 40 to ~20. Perhaps the file that reviewer received wasn’t showing correctly?

      (5) The authors focus mainly on CR, but during learning, the number of CR trials is  rather low (because they are not experts). This can also be seen in the noisier traces  in Figure 2a. Do the authors account for that (for example by taking equal trials from  each group)? 

      We accounted this by reconstructing bootstrap-resampled datasets with only 5 trials for each session in both the early stage and the expert stage. The mean trace of the 500 datasets again showed overall decrease in CR trial firing rate during task learning, with highly similar temporal dynamics to the original data.

      The figure is now added to supplementary materials (as Figure S3 in the revised manuscript).

      (6) From Figure 2a, it is evident that Hit trials increase response when mice become  experts in all brain areas. The authors have decided to focus on the response onset  differences in CRs, but the Hit responses display a strong difference between naïve  and expert cases.

      Judged from the learning curve in this task the mice learned to inhibit its licking action when the No-Go stimuli appeared, which is the main reason we focused on these types of trials.

      The movement effects and potential licking artefacts in Hit trials also restricted our interpretation of these trials.

      (7) Figure 3 is still a bit cumbersome. I wasn't 100% convinced of why there is a need  to rank the connection matrix. I mean when you convert to rank, essentially there could  be a meaningful general reduction in correlation, for example during licking, and this  will be invisible in the ranking system. Maybe show in the supp non-ranked data, or  clarify this somehow

      We agree with this important point. As stated in the manuscript and response to Reviewer #1, our motivation in taking the ranking approach was that the differences in firing rates could bias cross-correlation between spike trains, making raw accounts of significant neuron pairs difficult to compare across conditions, but we acknowledge the ranking measures might obscure meaningful differences or inflate weak effects in the data.

      We added the limitations of ranking approach in the discussion section and emphasized the necessity in future studies for better analysis approaches that could provide more accurate assessment of functional connection dynamics without bias from firing rates.

      (8) Figure 4a x label is in manuscript, which is different than previous time labels,  which were seconds.

      We now changed all time labels from Figure 2 to milliseconds.

      (9) Figure 4 input and output rank look essentially the same.

      We have compressed the brain plots in Figures 4-5 to better convey the take-home message.

      (10) Also, what is the late and early stim period? Can you mark each period in panel A? Early stim period is confusing with early CR period. Same for early respons and late response.

      The definition of time periods was in figure legends. We now mark each period out to avoid confusion.

      (11) Looking at panel B, I don't see any differences between delta-rank in early stim,  late stim, early response, and late response. Same for panel c and output plots.

      The rankings were indeed relatively stable across time periods. The plots are now compressed and showed a mean rank value.

      (12) Panels B and C are just overwhelming and hard to grasp. Colors are similar both  to regular rank values and delta-rank. I don't see any differences between all  conditions (in general). In the text, the authors report only M2 to have an increase in  rank during the response period. Late or early response? The figure does not go well  with the text. Consider minimizing this plot and moving stuff to supplementary.

      The colormap are now changed to avoid confusion, and brain plots are now compressed.

      (13) In terms of a statistical test for Figure 4, a two-way ANOVA was done, but over  what? What are the statistics and p-values for the test? Is there a main effect of time  also? Is their a significant interaction? Was this done on all mice together? How many  mice? If I understand correctly, the post-hoc statistics are presented in the  supplementary, but from the main figure, you cannot know what is significant and what  is not.

      For these figures we were mainly concerned with the post-hoc statistics which described the changes in the rankings of each region across learning.

      We have changed the description to “t-test with Sidak correction” to avoid the confusion.

      (14) In the legend of Figure 4, it is reported that 610 expert CR trials from 6 sessions,  instead of 7 sessions. Why was that? Also, like the previous point, why only 3 mice?

      Behavior data of all the sessions used were shown in Figure S1. There were only 3 mice used for the learning group, the difficulty to achieve sufficient unit yields across all regions in the same animal restricted our sample size

      (15) Body movement analysis: was this done in a different cohort of mice? Only now  do I understand why there was a division into early and late stim periods. In supp 4,  there should be a trace of each body part in CR expert versus naïve. This should also  be done for Hit trials as a sanity check. I am not sure that the brightness difference  between consecutive frames is the best measure. Rather try to calculate frame-to frame correlation. In general, body movement analysis is super important and should  be carefully analyzed.

      Due to the limitation in the experimental design and implementation, movement tracking was not performed during the electrophysiological recordings, and the 3 mice shown in Figure S4 (now S5) were from a separate group. We have carefully examined the temporal profiles of mouse movements and found it did not fully match the rank dynamics for all regions, and we have added these results and related discussion in the revised manuscript. However, we acknowledge the observed motion energy pattern could explain some of the functional connection dynamics, such as the decrease in face and pupil motion energy could explain the reduction in ranks for striatum.

      Without synchronized movement recordings in the main dataset, we cannot fully disentangle movement-related neural activity from task-related signals. We have made this limitation explicit in the revised manuscript and discuss it as a potential confound, along with possible approaches to address it in future work.

      (16) For Hit trials, in the striatum, there is an increase in input rank around the  response period, and from Figure S6 it is clear that this is lick-related. Other than that,  the authors report other significant changes across learning and point out to Figure 5b,c. I couldn't see which areas and when it occurred.

      We did naturally expect the activity in striatum to be strongly related to movement.

      With Figure S6 (now S7) we wished to show that the observed rank increase for striatum could not simply be attributed to changes in time of lick initiation.

      As some readers may argue that during learning the mice might have learned to only intensely lick after response signal onset, causing the observed rise of input rank after response signal, we realigned the spikes in each trial to the time of the first lick, and a strong difference could still be observed between early training stage and expert training stage.

      We still cannot fully rule out the effects from more subtle movement changes, as the face motion energy did increase in early response period. This result and related discussion has been added to the results section of revised manuscript.

      (17) Figure 6, again, is rather hard to grasp. There are 16 panels, spread over 4 areas,  input and output, stim and response. What is the take home message of all this?  Visually, it's hard to differentiate between each panel. For me, it seems like all the  panels indicate that for all 4 areas, both in output and input, frontal areas increase in  rank. This take-home message can be visually conveyed in much less tedious ways.  This simpler approach is actually conveyed better in the text than in the figures  themselves. Also, the whole explanation on how this analysis was done, was not clear  from the text. If I understand it, you just divided and ranked the general input (or  output) into individual connections? If so, then this should be better explained.

      We appreciate this advice and we have compressed the figures to better convey the main message.The rankings for Figure 6 and Figure S8 (now Figure S9) was explained in the left panel of Figure 3C. Each non-zero element in the connection matrix was ranked to value from 1-10, with a value of 10 represented the 10% strongest non-zero elements in the matrix.

      We have updated the figure legends of Figure 3, and we have also updated the description in methods (Connection rank analyses) to give a clearer description of how the analyses were applied in subsequent figures.

      (18) Figure 7: Here, the authors perform a ROC analysis between go and no-go  stimuli. They balance between choice, but there is still an essential difference between  a hit and a FA in terms of movement and licks. That is maybe why there is a big  difference in selective units during the response period. For example, during a Hit trial  the mouse licks and gets a reward, resulting in more licking and excitement. In FAs,the mouse licks, but gets punished, which causes a reduction in additional licking and  movements. This could be a simple explanation why the ROC was good in the late  response period. Body movement analysis of Hit and FA should be done as in Figure  S4.

      We appreciate this insightful advice.

      Though we balanced the numbers of basic trial types, we couldn’t rule out the difference in the intrinsic movement amount difference in FA trials and Hit trials, which is likely the reason of large proportion of encoding neurons in response period.

      We have added this discussion both in result section and discussion section along with the necessity of more carefully designed behavior paradigm to disentangle task information.

      (19) The authors also find selective neurons before stimulus onset, and refer to trial  history effects. This can be directly checked, that is if neurons decode trial history.

      We attempted encoding analyses on trial history, but regrettably for our dataset we could not find enough trials to construct a dataset with fully balanced trial history, visual stimulus and behavior choice.

      (20) Figure 7e. What is the interpretation for these results? That areas which peaked  earlier had more input and output with other areas? So, these areas are initiating  hubs? Would be nice to see ACC vs Str traces from B superimposed on each other.  Having said this, the Str is the only area to show significant differences in the early  stim period. But is also has the latest peak time. This is a bit of a discrepancy.

      We appreciate this important point.

      The limitation in the anatomical coverage of brain regions restricted our interpretation about these findings. They could be initiating hubs or earlier receiver of the true initiating hubs that were not monitored in our study.

      The Str trace was in fact above the ACC trace, especially in the response period. This could be explained by the above advice 18: since we couldn’t rule out the difference in the intrinsic movement amount difference in FA trials and Hit trials, and considering striatum activity is strongly related to movement, the Str trace may reflect more in the motion related spike count difference between FA trials and Hit trials, instead of visual stimulus related difference.

      This further shows the necessity of more carefully designed behavior paradigm to disentangle task information.

      The striatum trace also in fact didn’t show a true double peak form as traces in other regions, it ramped up in the stimulus region and only peaked in response period. This description is now added to the results section.

      In the early stim period, the Striatum did show significant differences in average percent of encoding neurons, as the encoding neurons were stably high in expert stage. The striatum activity is more directly affected Still the percentage of neurons only reached peak in late stimulus period.

      (21) For the optogenetic silencing experiments, how many mice were trained for each  group? This is not mentioned in the results section but only in the legend of Figure 8. This part is rather convincing in terms of the necessity for OFC and V2M

      We have included the mice numbers in results section as well.

      (C) Discussion

      (1) There are several studies linking sensory areas to frontal networks that should be  mentioned, for example, Esmaeili et a,l 2022, Matteucci et al., 2022, Guo et a,l 2014,Gallero Salas et al, 2021, Jerry Chen et al, 2015. Sonja Hofer papers, maybe. Probably more.

      We appreciate this advice. We have now included one of the mentioned papers (Esmaeili et al., 2022) in the results section and discussion section for its direct characterization of the enhanced coupling between somatosensory region and frontal (motor) region during sensory learning.The other studies mentioned here seem to focus more on the differences in encoding properties between regions along specific cortical pathways, rather than functional connection or interregional activity correlation, and we feel they are not directly related to the observations discussed.

      (2) The reposted reorganization of brain-wide networks with shifts in time is best  described also in Sych et al. 2021.

      We regret we didn’t include this important research and we have now cited this in discussion section.

      (3) Regarding the discussion about more widespread stimulus encoding after learning,  the results indicate that the striatum emerges first in decoding abilities (Figure 7c left  panel), but this is not discussed at all.

      We briefly discussed this in the result section. We tend to attribute this to trial history signal in striatum, but since the structure of our data could not support a direct encoding analysis on trial history, we felt it might be inappropriate to over-interpret the results.

      (4) An important issue which is not discussed is the contribution of movement which  was shown to have a strong effect on brain-wide dynamics (Steinmetz et al 2019;  Musall et al 2019; Stringer et al 2019; Gilad et al 2018) The authors do have some movement analysis, but this is not enough. At least a discussion of the possible effects of movement on learning-related dynamics should be added.

      We have included these studies in discussion section accordingly. Since the movement analyses were done in a separate cohort of mice, we have made our limitation explicit in the revised manuscript and discuss it as a potential confound, along with possible approaches to address it in future work.

      (D) Methods

      (1) How was the light delivery of the optogenetic experiments done? Via fiber  implantation in the OFC? And for V2M? If the red laser was on the skull, how did it get  to the OFC?

      The fibers were placed on cortex surface for V2M group, and were implanted above OFC for OFC manipulation group. These were described in the viral injection part of the methods section.

      (2) No data given on how electrode tracking was done post hoc

      As noted in our response to the advice 3 in results section, the electrode shanks were ultra-thin (1-1.5 µm) and it was usually difficult to recover observable tracks or electrodes in section.

      As an attempt to verify the accuracy of implantation depth, we measured the repeatability of implantation in a group of mice and found a tendency for the arrays to end in slightly deeper location in cortex (142.1 ± 55.2 μm, n = 7 shanks), and slightly shallower location in subcortical structure (-122.6 ± 71.7 μm, n = 7 shanks). We added these results as new Figure S1 to accompany Figure 1.

      Reviewer #3 (Recommendations for the authors):

      (1) The manuscript uses decision-making in the title, abstract and introduction.  However, nothing is related to decision learning in the results section. Mice simply  learned to suppress licking in no-go trials. This type of task is typically used to study behavioral inhibition. And consistent with this, the authors mainly identified changes  related to network on no-go trials. I really think the title and main message is  misleading. It is better to rephrase it as visual discrimination learning. In the  introduction, the authors also reviewed multiple related studies that are based on  learning of visual discrimination tasks.

      We do view the Go/No-Go task as a specific genre of decision-making task, as there were literature that discussed this task as decision-making task under the framework of signal detection theory or updating of item values (Carandini & Churchland, 2013; Veling, Becker, Liu, Quandt, & Holland, 2022).

      We do acknowledge the essential differences between the Go/No-Go task and the tasks that require the animal to choose between alternatives, and since we have now realized some readers may not accept this task as a decision task, we have changed the title to visual discrimination task as advised.

      (2) Learning induced a faster onset on CR trials. As the no-go stimulus was not  presented to mice during early stages of training, this change might reflect the  perceptual learning of relevant visual stimulus after repeated presentation. This further  confirms my speculation, and the decision-making used in the title is misleading. 

      We have changed the title to visual discrimination task accordingly.

      (3) Figure 1E, show one hit trial. If the second 'no-go stimulus' is correct, that trial  might be a false alarm trial as mice licked briefly. I'd like to see whether continuous  licking can cause motion artifacts in recording. 

      We appreciate this important point. There were indeed licking artifacts with continuous licking in Hit trials, which was part of the reason we focused our analyses on CR trials. Opto-based lick detectors may help to reduce the artefacts in future studies.

      (4) What is the rationale for using a threshold of d' < 2 as the early-stage data and d'>3  as expert stage data?

      The thresholds were chosen as a result from trade-off based on practical needs to gather enough CR trials in early training stage, while maintaining a relatively low performance.

      Assume the mice showed lick response in 95% of Go stimulus trials, then d' < 2 corresponded to the performance level at which the mouse correctly rejected less than 63.9% of No-Go stimulus trials, and d' > 3 corresponded to the performance level at which the mouse correctly rejected more than 91.2% of No-Go stimulus trials.

      (5) Figure 2A, there is a change in baseline firing rates in V2M, MDTh, and Str. There  is no discussion. But what can cause this change? Recording instability, problem in  spiking sorting, or learning?

      It’s highly possible that the firing rates before visual stimulus onset is affected by previous reward history and task engagement states of the mice. Notably, though recorded simultaneously in same sessions, the changes in CR trials baseline firing rates in the V2M region were not observed in Hit trials.

      Thus, though we cannot completely rule out the possibility in recording instability, we see this as evidence of the effects on firing rates from changes in trial history or task engagement during learning.

      References:

      Carandini, M., & Churchland, A. K. (2013). Probing perceptual decisions in rodents. Nat Neurosci, 16(7), 824-831. doi:10.1038/nn.3410.

      Cruz, K. G., Leow, Y. N., Le, N. M., Adam, E., Huda, R., & Sur, M. (2023).Cortical-subcortical interactions in goal-directed behavior. Physiol Rev, 103(1), 347-389. doi:10.1152/physrev.00048.2021

      Esmaeili, V., Oryshchuk, A., Asri, R., Tamura, K., Foustoukos, G., Liu, Y., Guiet, R., Crochet, S., & Petersen, C. C. H. (2022). Learning-related congruent and incongruent changes of excitation and inhibition in distinct cortical areas. PLOS Biology, 20(5), e3001667. doi:10.1371/journal.pbio.3001667

      Goldbach, H. C., Akitake, B., Leedy, C. E., & Histed, M. H. (2021). Performance in even a simple perceptual task depends on mouse secondary visual areas. Elife, 10, e62156. doi:10.7554/eLife.62156.

      Siegle, J. H., Jia, X., Durand, S., Gale, S., Bennett, C., Graddis, N., Heller, G.,Ramirez, T. K., Choi, H., Luviano, J. A., Groblewski, P. A., Ahmed, R., Arkhipov, A., Bernard, A., Billeh, Y. N., Brown, D., Buice, M. A., Cain, N.,Caldejon, S., Casal, L., Cho, A., Chvilicek, M., Cox, T. C., Dai, K., Denman, D.J., de Vries, S. E. J., Dietzman, R., Esposito, L., Farrell, C., Feng, D., Galbraith, J., Garrett, M., Gelfand, E. C., Hancock, N., Harris, J. A., Howard, R., Hu, B.,Hytnen, R., Iyer, R., Jessett, E., Johnson, K., Kato, I., Kiggins, J., Lambert, S., Lecoq, J., Ledochowitsch, P., Lee, J. H., Leon, A., Li, Y., Liang, E., Long, F., Mace, K., Melchior, J., Millman, D., Mollenkopf, T., Nayan, C., Ng, L., Ngo, K., Nguyen, T., Nicovich, P. R., North, K., Ocker, G. K., Ollerenshaw, D., Oliver, M., Pachitariu, M., Perkins, J., Reding, M., Reid, D., Robertson, M., Ronellenfitch, K., Seid, S., Slaughterbeck, C., Stoecklin, M., Sullivan, D., Sutton, B., Swapp, J., Thompson, C., Turner, K., Wakeman, W., Whitesell, J. D., Williams, D., Williford, A., Young, R., Zeng, H., Naylor, S., Phillips, J. W., Reid, R. C., Mihalas, S., Olsen, S. R., & Koch, C. (2021). Survey of spiking in the mouse visual system reveals functional hierarchy. Nature, 592(7852), 86-92. doi:10.1038/s41586-020-03171-x

      Sych, Y., Fomins, A., Novelli, L., & Helmchen, F. (2022). Dynamic reorganization of the cortico-basal ganglia-thalamo-cortical network during task learning. Cell Rep, 40(12), 111394. doi:10.1016/j.celrep.2022.111394

      Veling, H., Becker, D., Liu, H., Quandt, J., & Holland, R. W. (2022). How go/no-go training changes behavior: A value-based decision-making perspective. Current Opinion in Behavioral Sciences, 47,101206.

      doi:https://doi.org/10.1016/j.cobeha.2022.101206.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      __Reviewer #1 __

      *This study "Interpreting the Effects of DNA Polymerase Variants at the Structural Level" comprises an in-depth analysis of protein sequence variants in two DNA polymerase enzymes with particular emphasis on deducing the mechanistic impact in the context of cancer. The authors identify numerous variants for prioritisation in further studies, and showcase the effectiveness of integrating various data sources for inferring the mechanistic impact of variants. *

      *All the comments below are minor, I think the manuscript is exceptionally well written. *

      *> The main body of the manuscript has almost as much emphasis on usage of the MAVISp tool as analysis of the polymerase variants. I don't think this is an issue, as an illustrated example of proper usage is very handy. I do, however, think that the title and abstract should better reflect this emphasis. E.g. "Interpreting the Effects of DNA Polymerase Variants at the Structural Level with MAVISp". This would make the paper more discoverable to people interested in learning about the tool. *

      We have changed the manuscript title according to the reviewer’s suggestions, and the current title is “Interpreting the Effects of DNA Polymerase Variants at the Structural Level using MAVISp and molecular dynamics simulations.”

      • *

      *> Figure 1. I don't believe there is much value in showing the intersection between the datasets (especially since the in-silico saturation dataset intersects perfectly with all the others). As an alternative, I suggest a flow-chart or similar visual overview of the analysis pipeline. *

      • *

      We moved the former Figure 1 to SI. We decided to keep it at least in SI because it provides guidance on the number of variants relative to the total reported across the different disease-related datasets annotated with the MAVISp toolkit. On the other hand, the suggestion of a visual scheme for the pipeline followed in the analyses is a great idea. We have thus added Figure 1, which illustrates the pipeline workflows for analysis of known pathogenic variants and for discovery of VUS and other unknown variants, as suggested by the reviewer.

      *> Please note in the MAVISp dot-plot figure legends that the second key refers to the colour of the X-axis labels rather than the dots *

      We have revised the code that produces the dotplot so the second key is placed closer to the x-axis and clearer to read.

      Missing figure reference (Figure XXX) at the bottom of page 16

      We apologize for this mistake. Figures, contents, and the order have changed significantly to address all reviewers’ comments; this statement is no longer included. Also, we have carefully proofread the final version of the manuscript before resubmitting it.


      __Reviewer #2 __

      • *

      This manuscript reports a comprehensive study of POLE and POLD1 annotated clinical variants using a recently developed framework, MAVISp, that leverages scores and classifications from evolutionary-based variant effect predictors. The resource can be useful for the community. However, I have a number of major concerns regarding the methodology, the presentation of the results.

      *** On the choice of tools in MAVISp and interpretation of their outputs *

      - Based on the ProteinGym benchmark: https://proteingym.org/benchmarks*, GEMME outperforms EVE for predicting the pathogenicity of ClinVar mutations, with an AUC of 0.919 for GEMME compared to 0.914 for EVE. Thus, it is not clear for me why the authors chose to put more emphasis on EVE for predicting mutation pathogenicity. It seems that GEMME can better predict this property, without any adaptation or training on clinical labels. *

      • *

      We appreciate this comment, but we should not exclude EVE entirely from our data collection or from VEP coverage under MAVISp, based on a difference in AUC of 0.005. It was not our intention to place more emphasis on EVE predictions, and we have revised it accordingly. We would like to clarify the workflow we use for applications of the MAVISp framework in “discovery mode,” i.e., for variants not reported as pathogenic in ClinVar. This relies on AlphaMissense to prioritize the pathogenic variants and then retain further only the ones that also have an impact according to DeMaSk, which provides further indication for loss/gain-of-fitness. DeMaSk nicely fits the MAVISp framework, as it was trained on data from experimental deep mutational scans, which we generally import in the EXPERIMENTAL_DATA module. We have revised the text to make this clearer. GEMME and EVE (or REVEL) can be used for complementary analysis in the discovery workflow. Other users of MAVISp data might want to combine them with a different design, and they have access to all the original scores in the MAVISp database CSV file and the code for downstream analysis to do so. The choice for our MAVISp discovery workflow is mainly dictated by the fact that we have noticed we do not always have full coverage of all variants in many protein instances for EVE, GEMME, and REVEL. In particular, since the reviewer highlights GEMME over EVE, GEMME is currently unavailable for a few cases in the MAVISp database. This is because we need to rely on an external web server to collect the data, which slows down data collection on our end.

      Additionally, we have encountered instances where GEMME was unable to provide an output for inclusion in the MAVISp entries. When we designed the workflow for variant characterization in focused studies, we also made practical considerations. We are also exploring the possibility of using pre-calculated GEMME scores from

      https://datadryad.org/dataset/doi:10.5061/dryad.vdncjsz1s, but we encountered some challenges at the moment that deserve further investigations and considerations. For example, MAVISp annotations rely on the canonical isoform as reported in Uniprot, which can lead to mismatches with the GeMME pre-computed scores. So far, we have identified a couple of entries whose canonical isoforms no longer match the one in the pre-computed GEMME score dataset. Another limitation is the absence of the original MSA files in the dataset, which we would need for a more in-depth comparison with the ones we used for our calculations. We are facing some challenges in reproducing the MSA output from MMseq2-based ColabFold protocol in this context that need to be solved first. Overall, the dataset shows potential for integration into MAVISp, but we need to define the inclusion criteria and compare it with the existing results in more detail.

      Additionally, since the principle behind MAVISp is to provide a framework rooted in protein structure, AlphaMissense was the most reasonable choice for us as the primary indicator among the VEPs for our discovery workflow, and it has performed reasonably well in this case study and others.

      Of course, our discovery design is one of the many applications and designs that could be envisioned using the data provided and collected by MAVISp. We also include all raw scores in the database's final CSV files, allowing other end users to decide how to use them in their own computational design. The design choice we made for the discovery phase of focused studies, using MAVISp to identify variants of interest for further studies, has been applied in other publications (see https://elelab.gitbook.io/mavisp/overview/publications-that-used-mavisp-data) in some cases together with experiments. It is also a fair choice for the application, as the ultimate goal is to provide a catalog of variants for further studies that may have a potentially damaging impact, along with a corresponding structural mechanism.

      We have now revised the results section text where Table 1 is cited to clarify this. We also revised the terminology because we are using the VEPs' capability to predict damaging variants, rather than the pathogenic variants themselves. Experiments on disease models should validate our predictions before concluding whether a variant is pathogenic in a disease context, and we want to avoid misunderstandings among readers regarding our stance on this matter.

      - Which of the predictors, among AM, EVE, GEMME, and DeMaSK, provide a classification of variants and which ones provide continuous scores? This should be clarified in the text. If some predictors do not output a classification, then evaluating their performance on a classification task is unfair. The MAVISp framework sets thresholds on the predicted scores to perform the classification and it is unclear from reading the manuscript whether these thresholds are optimal nor whether using universal cutoff values is pertinent. For instance, for GEMME, a recent study shows that fitting a Gaussian mixture to the predicted score distribution yields higher accuracy than setting a universal threshold (https://doi.org/10.1101/2025.02.09.637326*). Along this line, for predictors that do not provide a classification, I am not convinced of the benefit for the users of having access to only binary labels, instead of the continuous scores. The users currently do not have any idea of whether each variant is borderline (close to theshold) or confident (far from threshold). *

      We agree with the reviewer, and this is due to us not being sufficiently clear in the manuscript. We have now revised the first part of the results to clarify this and to explain how we use the MAVISp data for application to focused studies, where the goal is to identify the most interesting variants that are potentially damaging and have a linked structural mechanism. Of course, there are other applications for leveraging the data in the database. We do offer scores to variants instead of just classification labels in the MAVISp csv file. They can be accessed, together with the full dataset, through the MAVISp website and reused for any applications.

      Additionally, we used the scores in the revised manuscript for the VUS variant ranking (Figure 5), applying a strategy recently designed as an addition to the downstream analysis tool kit of MAVISp (​​https://github.com/ELELAB/MAVISp_downstream_analysis), thereby allowing the scores themselves to be taken into account. Also, in the final part of the manuscript, the VEP scores have been used to introduce the ACMG-like classification of the variants in response to reviewer 3 (Figure 9 and Tables S3-S4). We absolutely agree that it is informative to keep the continuous scores, and we have never overlooked this aspect. However, we also need a strategy with a simpler classification to highlight the most interesting variants among thousands or more to start an exploration. This is why we included the support with dotplots and lolliplots, for example. Our purpose here is to identify, among many cases, those with a potentially damaging signature (and thus we need a binary classification for simplicity). Next, we evaluate whether this signature entails a fitness effect (with DeMaSk), and finally, retain only the cases we can identify with a structural mechanism to study further.

      The thresholds we set as the default for data analysis of dotplots in GEMME and DeMaSk are discussed in __Supplementary Text S3 __of the original MAVISp article. In brief, we carried out an ROC analysis against the scores for known pathogenic and benign variants in ClinVar with review status higher than 2. For applicative purposes, one could design other strategies to analyze the MAVISp data too; it is not limited to the workflow we decided to set as the primary one for our focused studies, as already mentioned above.

      We have now also included classification based on the GMM model applied to GEMME scores for POLE and POLD1, so it can be evaluated against other designs for our protein of interest (see Table 1 in the revised version). The method section has been revised to include this part, and the ProteoCast pre-print is cited as a reference. We have not yet officially included this classification in the MAVISp database because we must first follow internal protocols to meet the inclusion criteria for new methods or analyses. We will do so by performing a similar comparison on the entire MAVISp dataset and focusing on high-quality variants, as ClinVar annotations, as we did to set the current thresholds for GEMME in Supplementary Table S3 of the original MAVISp article. We need to allocate time and resources to this pilot, which is scheduled for Q1 2026.

      ** On the presentation and impact of the results

      • While reading the manuscript, it is difficult to grasp the main messages. The text contains abundant discussion about the potential caveats of the framework, the care that should be taken in interpreting the results, and the dependency on the clinical context. Although these aspects are certainly important, this extensive discussion (spread throughout the manuscript) obscures the results. Moreover, the way variants are catalogued throughout the text makes it difficult to grasp key highlights. The reader is left unsure about whether the framework can actually help the clinical practitioners.

      We have revised the text to make it easier to read, including additional MD simulations of three variants of interest and more downstream analyses to clarify the mechanisms of action. We also added a recap of the most interesting variants and their associated mechanisms, along with the ranking of the variants using the different features available in the MAVISp csv file for the VUS. We hope that this makes it more accessible and valuable. In the original publication, Table 2 aimed to provide a summary of the interesting variants, and we have revised it now in light of the ranking results and the additional analyses that allow us to clarify the mechanisms of action further. We have also introduced__ Figure 9 and Tables S3 and S4__, which present data on ACMG-like classification for VUS that can fall into the likely pathogenic or benign categories.

      • In many cases, the authors state that experimental validation is required to validate the results. Could they be more explicit on the experimental design and the expected outcome?

      We have added a section on the point above at pages 21 and 30, where, alongside the summary of mechanisms per variant, we propose the experimental readouts to use based on known MAVE assays or assays that could be designed.

      • AlphaMissense seems to tend to over-predict pathogenicity. Could the authors comment on that?

      We are unsure whether this comment relates to our specific case or to a general feature of AlphaMissense.

      In the latest iteration of our small benchmarking dataset for POLE and POLD1 (as shown in the paper), we achieve a sensitivity of 1 and a balanced specificity of 0.96 for AlphaMissense, which suggests that AlphaMissense does not over-predict pathogenicity very significantly in these proteins, predicting true negatives (i.e., non-pathogenic) mutations quite accurately. As performance was sufficient in our case, we deemed recalibrating the classification threshold for AlphaMissense unnecessary.

      We are aware that this is not necessarily the case for every gene, e.g., it has been shown that AlphaMissense shows lower specificity in some cases (see e.g. 10.3389/fgene.2024.1487608, 10.1038/s41375-023-02116-3). This is also why we found it essential to evaluate its performance with its recommended classification on a gene-specific basis, as done here. In the future, we will keep a critical eye on our predictors to understand whether they are suitable for the specific case of study, or whether they require threshold recalibration or the use of a different predictor.

      ** On specific variants

      • The mention of H1066R, H1068, and D1068Y is very confusing. There seems to be a confusion between residue numbers and amino acid types.

      We have revised the text for typos and errors. This part of the text changed, so these specific variants are no longer mentioned.

      • A major limitation of the 3D modeling is this impossibility to include Zn2+ coordination by cysteine residues. This limitation holds for both POLE and POLD1. Could the authors comment on the implication of this limitation for interpreting the mechanistic impact of variants. In particular, there are several variants reported in the study that consist in gain of cysteines. The authors discuss the potential impact of some of these mutations on the structural stability but not that on Zn coordination or the formation of disulphide bridges.

      This is a great suggestion. We had, for a long time, a plan in the pipeline to include a module to tackle changes in cysteines. We have now used this occasion to include a new module that allows identifying mutations: 1) that are likely to disrupt native disulphide bridges and annotate them as damaging or 2) potential de novo formation of disulphide bridges upon a mutation of a residue to a cysteine, also annotated as damaging with respect to the original functionality. We also included a step that evaluates if the protein target is eligible for the analysis based on the cellular localization, since in specific compartments the redox condition (such as the nucleus) would not favour disulfide bridges. The module has been added to MAVISp, and we are collecting data with the module for the existing entries in the database to be able to release them at one of the following updates. More details are on the website in the Documentation section (https://services.healthtech.dtu.dk/services/MAVISp-1.0/). We could not apply the module to POLE and POLD1 since they are nuclear proteins, and it would not be meaningful to look into this structural aspect either in connection with loss of native cysteines or de novo disulfide bridge formation upon mutations that change a wild-type residue to a cysteine.

      We would like to clarify that the structures we use, as it is a focused study rather than high-throughput data collection for the first inclusion in the MAVISp database, have been modelled with zinc at the correct position. It is just the first layer of high-throughput collection with MAVISp, which uses models without cofactors unless the biocurator attempts to model them or we move to collect further data for research studies (as done here). Prompted by this confusion, we have now added a field to the metadata of a MAVISp entry indicating the cofactor state. Nevertheless, the RaSP stability prediction does not account for the cofactor's presence, even when it is bound in the model. This is discussed in the Method Section. We thus did not further analyze the variants in sites directly coordinating the metal groups due to these limitations.

      • MAVISp does not identify any mechanistic effect for a substantial portion of variants labelled as pathogenic. Could the authors comment on this point?

      We are not sure how to interpret this question. It can be read two ways. Either the reviewer is asking about the known pathogenic ClinVar variants without mechanistic indicators, or more generally, the ones that we label “pathogenic” in discovery (we actually refer to more usually damaging in the dotplots), and for which we cannot associate a mechanism.

      Overall, as a general consideration, it would be challenging to envision a mechanism for each variant predicted to be functionally damaging. For example, in the case of POLE and POLD1, we still lack models of complexes that did not meet the quality-control and inclusion criteria for the binding-free-energy scheme used by the LOCAL INTERACTION module. Also, when it comes to effects on catalysis or to analyzing effects in more detail at the cofactor sites, we could miss effects that would require QM/MM calculations. Other points we have not yet covered include cases related to changes in protein abundance due to degron exposure for degradation, which is one of the mechanistic indicators we are currently developing. Moreover, we used only unbiased molecular simulations of the free protein, and we would need future studies with enhanced sampling approaches and longer timescales to better address conformational changes and changes in the population of different protein conformational states induced by the mutation (including DNA). This can be handled formally by the MAVISp framework using metadynamics approaches, but it would be outside the scope of this work and is a direction for future studies on a subset of variants to investigate in even greater detail.

      Furthermore, modifications related to PTM differ from phosphorylations. Anyway, our scope is to use the platform to provide structure-based characterization of either known pathogenic variants or potentially damaging ones predicted by VEPs, and focus on more detailed analyses of those. As we develop MAVISp further and design new modules, we will also be able to tackle other mechanistic aspects. This discussion, however, is more relevant to the MAVISp method paper itself.

      Moreover, none of the variants discussed are associated with allosteric effect. Is this expected?

      .

      In general, allosteric mutations are rare. Nevertheless, in these case studies, the size of the proteins under investigation also poses some challenges for the underlying coarse-grain model used in the simple mode to generate the allosteric signalling map, as we have found it performs best on protein structures below 1000 residues

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)): __

      The manuscript utilized the MAVISp framework to characterize 64,429 missense variants (43,415 in POLE and 21,014 in POLD1) through computational saturation mutagenesis. The authors integrate protein stability predictions with pathogenicity predictors to provide mechanistic insights into DNA polymerase variants relevant to cancer predisposition and immunotherapy response. There are discussions of known PPAP-associated variants and somatic cancer mutations in the context of known data and some proposed variants of interest (which are not validated).

      Major comments:

      I was unaware of the MAVISp framework. It concerns me that alebit this paper has a lot of technical details about the framework, its not the paper about the framework. I did look into the paper https://www.biorxiv.org/content/10.1101/2022.10.22.513328v5 which keeps benign updated (version five now) for three years, but I do not see a peer reviewed version. It would be unfair of me to peer review the underlying framework of the work but together with the previous comments, I am a bit concerned.

      We have intentionally left the MAVISp resource paper as a living pre-print until we have sufficient data in the database that could be useful to the rest of the community. We have been actively revising the manuscript, thanks to comments from users in previous versions, to ensure it provides a solid resource. We had attempted approximately one and a half years ago a submission to a high-impact journal and even addressed the reviewers’ comments there. Still, we did not receive feedback for a long time, and ultimately, we were not sent to the reviewers again despite more than six months of work on our side. After that, we realized that we would benefit from collecting a larger dataset, and we invested time and effort in that and submitted again for revision, this time through Review Commons in the Summer of 2025. Anyway, the paper has been peer-reviewed by three reviewers through Review Commons. We submitted the revised version and response to reviewers, and it is now under revision with Protein Science. The reviewers’ comments and our responses can be found in the “Latested Referred Preprints” on the Review Commons website with the date of 17th of October 2025.

      We would also like to clarify another point on this. In our experience, it is common practice to keep sofware on BioRxiv even for a long and to bring it to a more complete form in parallel with the community already applying it. This allows feedback from peers in a broad manner. We had similar experiences with MoonlightR, where the first publications with applications within the TCGA-PanCancer papers came before the publication of the tool itself, and the same has been for any of our main workflows, such as MutateX or RosettaDDGPrediction, which are widely used by the community. Finally, it can be considered that the MAVISp framework has already been used in different published peer-review studies (since 2023), attesting to its integrity and potential. Here, the reviewer can read more about the studies that used MAVISp data or modules: https://elelab.gitbook.io/mavisp/overview/publications-that-used-mavisp-data

      For example, the authors are using AlphaFold models to predict DDG values. Delgado et al. (2025, Bioinformatics) explicitly tested FoldX on such models and concluded that "AlphaFold2 models are not suitable for point mutation ΔΔG estimation" after observing a correlation of 0.06 between experimental and calculated values. AlphaFold's own documentation states it "has not been validated for predicting the effect of mutations". Pak et al. (2023, PLOS ONE) showed correlation between AlphaFold confidence metrics and experimental ΔΔG of -0.17. Needless to say that these concerns seriously undermine the validity of a major part of the study.

      We appreciate the reviewer’s comments and would like to clarify a point regarding the MAVISp STABILITY module, which we believe may have been misunderstood. Based on the studies cited by the reviewer, which critique the use of AF-generated mutant structures for assessing stability effects, we understand that this assumption may have led to the concern.

      The STABILITY module utilises three in silico tools (FoldX, Rosetta, and RaSP) to assess changes in protein stability resulting from missense mutations. Importantly, the input to these assessments consists of AF models of the WT protein structures, not of AF-generated mutant structures. The mutants are generated using the FoldX and Rosetta protocols, along with estimates of the changes in free energy. For further details and clarification, we kindly refer the reviewer to the MAVISp original publication.

      Also, one should consider the goal of our use of free energy calculations: not to identify the exact ΔΔG values, but to correlate with data from in vitro or biophysical experiments, such as those from cellular experiments like MAVE. We, other researchers, have shown that we have a good agreement in the MAVISp paper (case study on PTEN as an example in the original MAVISp publication and https://pmc.ncbi.nlm.nih.gov/articles/PMC5980760/ https://pubmed.ncbi.nlm.nih.gov/28422960/,10.7554/eLife.49138). Also, we had, before even designing the STABILITY module for MAVISp, verified that we can use WT structures from AlphaFold (upon proper trimming and quality control with Prockech) instead of experimental structure without compromising accuracy in the publications of the two main protocols of the STABILITY module (MutateX and RosettaDDGPrediction and a case study on p53, https://doi.org/10.1093/bib/bbac074,https://doi.org/10.1002/pro.4527). In the focused studies, we also carefully consider whether the prediction is at a site with a low pLDDT score or surrounded by other sites with a low pLDDT score before reaching any conclusions. The pLDDT score is reported in the MAVISp csv file exactly to be used for flagging variants or looking closer at them, as we discuss in this study (see, for example, Figure 2). Additionally, it should be noted that we employ a consensus approach across the two classes of methods in MAVISp to account for their limitations arising from their empirical energy function or backbone stiffness. Furthermore, in the focused studies, we also collected molecular dynamics simulations for the ensemble mode and reassessed the stability on different conformations from the trajectory to compensate for the issues with backbone stiffness of FoldX, RaSP, and Rosetta ΔΔG protocols.

      I have to add that this is also true for the technical choices: Several integrated predictors (DeMaSk, GEMME) are outperformed by newer methods according to benchmarking studies (https://www.embopress.org/doi/full/10.15252/msb.202211474). AlphaMissense, while state-of-the-art, shows substantial overcalling of pathogenic variants. could ensemble meta-predictors (REVEL, BayesDel) improve accuracy?

      The MAVISP framework includes REVEL as one of the VEPs available for data analysis. In this way, we were representing one of the ensemble meta-predictors. This is explained in the MAVISp original paper. We were not aware of BayesDel, which we will consider for one of the next pilot projects to assess new tools for the framework (see more details below on how we generally proceed). Currently, we cannot use REVEL for all variants because we do not necessarily have genomic coordinates for them. We retrieve genomic-level variants corresponding to our protein variants from mutation databases, where available (e.g., ClinVar, COSMIC, or CbioPortal). However, as we strive to cover every possible mutation, several of the variants in MAVISp are not in the database, which means we do not have the corresponding genomic variation for those, limiting our ability to annotate them with VEPs. In the future (see GitHub issue https://github.com/ELELAB/cancermuts/issues/235), we will revise the code to identify the genomic variants that could give rise to each protein mutation of interest, thereby increasing the coverage of VEP annotations.

      We can see from the work cited by the reviewer that ESM-1v, EVE, and DeepSequence are among the top performers, whereas reviewer 2 cited another work in which GEMME outperforms EVE. We have been covering all of them, except ESM-1v, in our framework. We are planning to evaluate for inclusion in MAVISP some of the new top-performing predictors, including ESM-1v, in Q2 2026 (according to the protocol described later in this answer), which is why it is not available yet.

      In our discovery protocol (i.e., when we work on VUS or variants not classified in ClinVar), we generally use AlphaMissense as the first indicator of potentially damaging variants. EVE, REVEL, or GEMME could be used in the case that AlphaMissense data are missing or as a second layer of evidence in the case we want, for example, to select a smaller pool of variants for experimental validation in a protein target with too many uncharacterized variants and too many that pass the evaluation with our discovery workflow. Finally, we rely on DeMaSk, as it also provides information on possible loss- or gain-of-fitness signatures to further filter the variant of interest for the search of mechanistic indicators. Since the MAVISp framework is modular, other users may want to use the data differently and design a different workflow. They have access to them (scores and classifications) through the web portal. The fact that we combine AlphaMissense with DeMaSk could yield final results after further variant filtering and mitigate the issue that AlphaMissense risks over-predicting pathogenicity.

      In general, we work to keep MAVISp up-to-date, and we have developed a protocol for the inclusion of new methodologies in the available module before generating and releasing data with new tools in the database. In particular, we perform comparative studies using data already available in the database to evaluate the performance of new approaches against that of the tools already included. Depending on the module, we use different golden standards that we are also curating in parallel, and it would make sense to apply for that specific module. For example, if the question is to evaluate VEP, we would compare it against ClinVar known variants with good review status. If the VEP performs better than the currently included ones, we can include it as an additional source of annotations and evaluate whether we could change the protocol for the discovery/characterization of variants. We operate similarly for the structural modules. For example, for stability, we are importing experimental data from MAVE assays on protein abundance and use them as a golden standard where we evaluate new approaches against the current FoldX and Rosetta-based consensus for changes in folding free energies. Instead, If we find evidence that suggests switching to a new method or integrating it would be beneficial, we will do so as a result of these investigations. An example of our working mode for evaluating tools for inclusion in the framework is illustrated by how we handled the comparison between RaSP and Rosetta in the MAVISp original article (Supplementary file S2) before officially switching to RaSP for high-throughput data collection. We still maintain Rosetta, especially in focused studies, to validate further variants classified as uncertain.

      *Further, I found the web site of the framework, where I looked for the data on these models, rather user unfriendly. Selecting POLD1, POLD2, or POLE tells me I am viewing entries A2ML1, ABCB11, ABCB6 respectively, when I search for POL and then click: these are the first three entries of the table, bot the what I click on. displaying the whole table and clicking on POLD1, gets me to POLD1. However, when I selected "Damaging mutations on structure" I get "Could not fetch protein structure model from the AlphaFold Protein Structure Database". Many other features are not working (Safari or Chrome, in a Mac). That is a concern for the usability of the dataset. *

      • *

      We have been able to reproduce the bugs identified by the reviewer and have fixed them. The second was connected to recent updates on the AlphaFold Protein Structure Database. We are not really sure how to work and act on the “other features that are not working” due to lack of specificity in this comment. Still, we have worked to make the website more robust: the coauthors of this work and other colleagues in the MAVISp team have extensively tested it across different proteins and with various browsers and operating systems, and we have fixed all identified issues. We also have a GitHub repository where users can open issues to share problems they have been experiencing with the website, which we will fix as promptly as we can (https://www.github.com/ELELAB/MAVISp), as we do for any of the tools we develop and maintain. If the reviewer were to come across other specific problems with the website, we recommend to (anonymously) open issues on the MAVISp repository so that they can be described more in detail and dealt with appropriately.

      This comment seems more related to the MAVISP paper itself than to the POLE and POLD1 entries. We have been doing several revisions to the web app to improve it over time. We are also afraid that the reviewer consulted it during one of these changes, and we hope it will be better now. For POLE and POLD1, the CSV files were, in any case, also available through the MAVISp website itself (https://services.healthtech.dtu.dk/services/MAVISp-1.0/), as well as in the OSF repository connected to this paper (https://osf.io/z8x4j/overview), in case the reader needed to consult them or as a reference for the analyses reported in this paper.

      Albeit this is a thorough analysis with the existing tools, and the authors make some sparse attempts to put the mutants classification in context with examples, the work stays descriptive for know effects in literature, or point out that e.g. "further functional and in vitro assays are required". The examples are not presented in a systematic way, or in an appealing manner. Thus, what this manuscript adds to the web site is unclear. It is a description of content, which could be at least more appealing if examples woudl be more clearly outlined in a conceptual framework, and illustrated more consistently. For exmaple I read in the middle of mage 16 "One such example is the F931S (p.Phe931Ser) variant (Figure 5A)" and then I see "F931 forms contacts with D626, a critical residue for the coordination of Mg2+ which is essential for the correct orientation of the incoming nucleotide (Figure XXX)". Figure 5B is not XXX as this has just many mutations labeled. These issues are very discouraging. I woudl recommend to put much more effort in examples, put them in clearer paragraphs, and decribe results rather than the methodology. Doing both in an intemigled way, clearly does not work for me.

      We have revised the storyline to make it more straightforward for the reader, focusing on the essential messages and avoiding excessive description in the results section, instead conveying the key points directly. We also included new simulation data on three variants and downstream analyses of other variants. We revised the section to focus less on methodologies and more on the actual biological results. We have also added a ranking approach for the VUS and an ACMG-like classification to facilitate the identification of the most important results.

      Additionally, we included a summary Table (Table 2) and Figure 9 that present the main findings on the VUS, and we discussed in the text the possible associated experimental validation.

      We also do not fully understand the reviewer’s comment “the work stays descriptive for know effects in literature”. We agree that we should make a better effort to write the results in a logical and easy-to-follow manner, without risking the reader getting lost in too many details, and with more dedicated subsections. However, the paper does not describe just known effects in the literature. We had, in the previous version, a section aimed at identifying mechanistic indicators for ClinVar-reported variants that are also (in some cases) functionally characterized. This is true, but it is the very first part of the results, and it is still adding structure-based knowledge to these variants. After this, we also reported predicted results with mechanisms for VUS and variants in other databases. We took the opportunity in this revised version to elaborate more on the results of the variants reported in COSMIC and cBioPortal.

      We are afraid that we also do not fully understand the reviewer's comment on the fact that “Thus, what this manuscript adds to the website is unclear.” We have generated POLE and POLD1 data with the MAVISp toolkit in both ensemble and simple mode, and the whole pool of local interactions with other proteins and DNA, specifically for this publication. It should be acknowledged that we have generated new data in ensemble mode, which relies on all-atom microsecond molecular dynamics simulations, and additional modules for the simple mode, including calculations with the flexddg protocol of Rosetta, which is also computationally demanding, to provide a comprehensive overview of the effects of variants in POLE and POLD1. The two proteins were available in the database only in simple mode with the basic default modules, and the remaining data were collected during this research article. This can also be inferred by the references in the csv file of the ensemble mode, which refer only to the DOI of the pre-print of this article. This entails a substantial effort in computing and analysis. The website is the repository for data that researchers collect using the MAVISp protocols or modules; in our opinion, it cannot replace a research project. We designed the database to store the data generated by the framework for others to consult and use for various purposes (e.g., biological studies, preparing datasets for benchmarking approaches against existing ones, or using features for machine learning applications). The entry point in the database is the simple mode, along with some compulsory modules (VEPs, STABILITY, PTM, EFOLDMINE, SASA). After this initial entry point, a biocurator or a team of researchers can decide to expand data coverage by moving into the other modules. Still, at some point, one would need to design focused studies to have a comprehensive overview of the effects on specific targets, as we did here, or, for example, in the publication https://doi.org/10.1016/j.bbadis.2024.167260.

      Furthermore, there are analyses here, especially in the simulations, that are not directly available from consulting the database; in these cases, one needs to use other resources beyond MAVISp to investigate further the mechanisms underlying the predicted mechanistic indicators. We also included simulations of mutant variants to validate the hypothesis further. And another example is the analysis of the effects on the splicing site that is not covered by a structure-based framework, such as MAVISp, but is still an essential aspect in the analysis of the variants' effects.

      Will the community find this analysis useful?

      The analysis provided here will be helpful, especially for researchers interested in experimental studies of these enzymes, because they have throughout the study an extensive portfolio of structural data to consult, including a ranked list of variants by class of effect. We originally started designing MAVISp because we realized it was needed by our experimental collaborators, both in cellular biology and in more clinical research, whenever they needed to predict or simulate variants, and we expanded the concept into a robust, versatile framework for broader use. Especially for those genes where extensive MAVE data are not available (as in this case), having a set of variants to test experimentally is crucial support, as it provides the potential mechanism behind the predicted damaging variant.

      How many ClinVar VUS could be reclassified using MAVISp data under current ACMG/AMP guidelines?

      • *

      The ACMG/AMP variant classification guidelines, to the best of our knowledge, include computational evidence (PP3/BP4) and well-established functional studies (PS3/BS3). Because MAVISp provides multi-level mechanistic predictions derived from structural modelling, these data formally fall within the PP3/BP4 computational category. They cannot be used to reclassify ClinVar VUS independently under ACMG/AMP rules. This is not really the goal of our framework, which is to provide a structure-based framework for investigating potentially damaging variants predicted by VEPs. However, the suggestion of the reviewer is something we wanted to explore too in general with MAVISp data, and we failed because of a lack of time. We checked the requirements for PP3, BP4, and PM1 and developed a classifier for VUS reported in ClinVar, using MAVISp features in accordance with the ACMG/AMP guidelines. Using ClinVar pathogenic and benign variants with at least a review status of 1 for calibration, we obtained thresholds for all MAVISp-supported VEPs (REVEL, AlphaMissense, EVE, GEMME, and DeMaSk). These thresholds were then applied to all ClinVar VUS to determine PP3 (pathogenic-supporting) and BP4 (benign-supporting) evidence. In parallel, we constructed a PM1-like mechanistic evidence category that integrates MAVISp structural stability, protein–protein interactions, DNA interactions, long-range allosteric paths, functional sites, and PTM-mediated regulatory effects. Variants classified as damaging in MAVISp according to such criteria were assigned PM1-like support. These evidence tags provide mechanistic insight to support VUS classification for polymerase proofreading genes. The workflow and complete annotated VUS table are now included in the revised manuscript and in the OSF repository. Although these findings cannot formally reclassify variants under ACMG/AMP criteria, they provide prioritization for PS3/BS3 experimental validation and highlight variants that are likely to be reclassified once supporting functional evidence becomes available.

      How do MAVISp predictions meet calibrated thresholds, as in https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-023-01234-y* for the exonuclease domain of POLE and POLD1? *

      • *

      Mur et al. (Genome Medicine 2023) restricted their ACMG/AMP recommendations to the exonuclease domain (ED) because (i) nearly all known pathogenic germline variants in POLE/POLD1 cluster within the ED, (ii) the ED has a well-characterised structure–function architecture, and (iii) sufficient pathogenic and benign variants exist only within the ED to support empirical calibration. To mirror this approach, we performed the calibration workflow exclusively on ED variants (POLE residues 268–471; POLD1 residues 304–533). For these ED-restricted variants, we recalibrated all MAVISp-derived computational predictors (REVEL, AlphaMissense, EVE, GEMME, DeMaSk) using ClinVar P/LP and B/LB variants. We applied the resulting POLE/POLD1-specific thresholds to all ClinVar VUS within the ED. We also applied our PM1-like structural/functional evidence exclusively to ED variants. The results of this ED-specific analysis are now reported in the revised manuscript (Figure 9 Supplementary Tables S3 and S4), as also explained in the response to the previous question. This ensures that MAVISp predictions are applied in a manner that is consistent with the principles of Mur et al. and ACMG/AMP variant interpretation.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors aimed to examine how the covariation between cognition (represented by a g-factor based on 12 features of 11 cognitive tasks) and mental health (represented by 133 diverse features) is reflected in MR-based neural markers of cognition, as measured through multimodal neuroimaging (structural, rsfMRI, and diffusion MR). To integrate multiple neuroimaging phenotypes across MRI modalities, they used a so-called stacking approach, which employs two levels of machine learning. First, they built a predictive model from each neuroimaging phenotype to predict a target variable. Next, in the stacking level, they used predicted values (i.e., cognition predicted from each neuroimaging phenotype) from the first level as features to predict the target variable. To quantify the contribution of the neural indicators of cognition explaining the relationship between cognition and mental health, they conducted commonality analyses. Results showed that when they stacked neuroimaging phenotypes within dwMRI, rsMRI, and sMRI, they captured 25.5%, 29.8%, and 31.6% of the predictive relationship between cognition and mental health, respectively. By stacking all 72 neuroimaging phenotypes across three MRI modalities, they enhanced the explanation to 48%. Age and sex shared substantial overlapping variance with both mental health and neuroimaging in explaining cognition, accounting for 43% of the variance in the cognition-mental health relationship.

      Strengths:

      (1) A big study population (UK Biobank with 14000 subjects).

      (2) The description of the methods (including Figure 1) is helpful in understanding the approach.

      (3) This revised manuscript is much improved compared to the previous version.

      Weaknesses:

      (1) Although the background and reason for the study are better described in this version of the manuscript, the relevance of the question is, in my opinion, still questionable. The authors aimed to determine whether neural markers of cognition explain the covariance between cognition and mental health and which of the 72 MRI-based features contribute to explaining most of the covariance. I would like to invite the authors to make a stronger case for the relevance, keeping the clinical and scientific relevance in mind (what would you explain to the clinician, what would you explain to the people with lived experience, and how can this knowledge contribute to innovation in mental health care?).

      Thank you for this insightful observation. We agree that establishing the real-world significance of fundamental research is paramount, and we have revised our manuscript to better articulate this relevance.

      For clinicians, our work (a) corroborates the link between cognition and mental health, confirming the transdiagnostic role of cognition, and (b) demonstrates that current neuroimaging tools can capture the neurobiology underlying this relationship. These findings offer several implications for clinical practice. First, they support the development of interventions aimed at enhancing cognitive functioning as a pathway to improving mental health. Second, our work introduces neuroimaging as a potential tool for assessing the neurobiological basis of the cognition–mental health connection. With further research, clinicians may be able to use neuroimaging to track cognitive changes at the neural level, which could help monitor treatment efficacy for interventions (e.g., stimulant medications for ADHD) designed to boost cognitive functioning.

      Following your suggestions, we have expanded the Discussion (Line 684) to include future directions and clinical perspectives on the findings.

      Line 684: “Neuroimaging offers a unique window into the biological mechanisms underlying cognition–mental health overlap – insights unattainable from behavioural data alone. Our findings validate brain-based neural markers as a core unit of analysis for cognitive functioning, advancing mental health research through the lens of cognition. Beyond this conceptual contribution, the study has clinical implications. First, by demonstrating a transdiagnostic link between cognition and mental health, we support interventions that enhance cognition as a pathway to improving mental health. Second, we show neuroimaging as an effective tool for assessing the neurobiological basis of this link. Quantifying neuroimaging’s capacity to capture this relationship is essential for future research integrating imaging with cognitive testing to monitor treatment-related neural changes. Such work could enable personalised interventions, using neuroimaging to track cognitive changes and treatment efficacy (e.g., stimulant medications for ADHD) aimed at boosting cognitive functioning.”

      (2) The discussion on the interpretation of the positive and negative PLRS loadings is not very convincing, and the findings are partly counterintuitive. For example (1) how to explain that distress has a positive loading and anxiety/trauma has a negative loading?; (2) how to explain that mental health features like wellbeing and happiness load in the same direction as psychosis and anxiety/trauma? From both a clinical and a neuroscientific perspective, this is hard to interpret.

      Thank you for pointing this out. We appreciate your concern regarding the interpretation of positive and negative PLSR loadings. To clarify:

      (1) The directions of PLSR loadings are broadly consistent with univariate correlations, suggesting that the somewhat counterintuitive relationships mentioned are shown even when we apply simply univariate correlations. PLSR extends beyond univariate approaches by modelling multivariate relationships across features and outcomes. It constructs new components – linear combinations of predictors – that simultaneously explain variance in the predictors and their covariance with the response.

      (2) The positive loading of distress likely reflects cohort-specific questionnaire design in the UK Biobank, where feeling of distress was tied to seeking medical help. Individuals with higher cognition and socioeconomic status may be more likely to seek professional support, which explains the counterintuitive direction.

      (3) The negative loadings of wellbeing and happiness may also reflect cohort-specific effects, such as older age, and align with prior work linking excessive optimism to poorer reasoning and cognitive performance. This suggests that realism or pessimism may sometimes be associated with better cognition, particularly in older adults.

      These points are discussed in detail in the manuscript (Lines 493–545). We have emphasised that some of these findings may be cohort-specific and cited supporting literature, as seen below.

      (1) How to explain that distress has a positive loading and anxiety/trauma has a negative loading?

      Line 493: “The directions of PLSR loadings were broadly consistent with univariate correlations. PLSR extends beyond univariate approaches by modelling multivariate relationships across features and outcomes. Consistently, both univariate correlations and factor loadings derived from the PLSR model indicated that scores for mental distress, alcohol and cannabis use, and self-harm behaviours related positively, and the scores for anxiety, neurological and mental health diagnoses, unusual or psychotic experiences, happiness and subjective well-being, and negative traumatic events related negatively to the g-factor. Positive PLSR loadings of features related to mental distress may indicate greater susceptibility to or exaggerated perception of stressful events, psychological overexcitability, and predisposition to rumination in people with higher cognition [72]. On the other hand, these findings may be specific to the UK Biobank cohort and the way the questions for this mental health category were constructed. In particular, to evaluate mental distress, the UK Biobank questionnaire asked whether an individual sought or received medical help for or suffered from mental distress. In this regard, the estimate for mental distress may be more indicative of whether an individual experiencing mental distress had an opportunity or aspiration to visit a doctor and seek professional help [73]. Thus, people with better cognitive abilities and also with a higher socioeconomic status may indeed be more likely to seek professional help.”

      Line 529: “Consistent with previous studies, we showed that anxiety and negative traumatic experiences were inversely associated with cognitive abilities [90–93]. Anxiety may be linked to poorer cognitive performance via reduced working memory capacity, increased focus on negative thoughts, and attentional bias to threatening stimuli that hinder the allocation of cognitive resources to a current task [94–96]. Individuals with PTSD consistently showed impaired verbal and working memory, visual attention, inhibitory function, task switching, cognitive flexibility, and cognitive control [97–100]. Exposure to traumatic events that did not reach the PTSD threshold was also linked to impaired cognition. For example, childhood trauma is associated with worse performance in processing speed, attention, and executive function tasks in adulthood, and age at a first traumatic event is predictive of the rate of executive function decline in midlife [101,102]. In the UK Biobank cohort, adverse life events have been linked to lower cognitive flexibility, partially via depression level [103].”

      (2) How to explain that mental health features like wellbeing and happiness load in the same direction as psychosis and anxiety/trauma?

      Line 545: “Finally, both negative PLSR loadings and corresponding univariate correlations for features related to happiness and subjective well-being may be specific to the study cohort, as these findings do not agree with some previous research [107–109]. On the other hand, our results agree with the study linking excessive optimism or optimistic thinking to lower cognitive performance in memory, verbal fluency, fluid intelligence, and numerical reasoning tasks, and suggesting that pessimism or realism indicates better cognition [110]. The concept of realism/optimism as indicators of cognition is a plausible explanation for a negative association between the gfactor and friendship satisfaction, as well as a negative PLSR loading of feelings that life is meaningful, especially in older adults who tend to reflect more on the meaning of life [111]. The latter is supported by the study showing a negative association between cognitive function and the search for the meaning of life and a change in the pattern of this relationship after the age of 60 [112]. Finally, a UK Biobank study found a positive association of happiness with speed and visuospatial memory but a negative relationship with reasoning ability [113].”

      (3) The analysis plan has not been preregistered (e.g. at OSF).

      Note: the computational aspects of the methods fall beyond my expertise.

      Thank you for pointing this out. We acknowledge that the analysis plan was not preregistered, as our approach was primarily data‑driven rather than hypothesis‑driven. We essentially applied the machine learning approach to quantify the strength of the cognition-mental health relationship in relation to neuroimaging. To ensure transparency and reproducibility, we have made all analysis code and intermediate outputs publicly available on our GitHub repository (https://github.com/HAM-lab-Otago-University/UKBiobank/) within the constraints of UK Biobank’s ethical policy and provided a detailed description of each methodological step in the Supplementary Materials.

      Reviewer #2 (Public review):

      Summary:

      The goal of this manuscript was to examine whether neural indicators explain the relationship between cognition and mental health. The authors achieved this aim by showing that the combination of MRI markers better predicted the cognition-mental health covariation.

      Strengths:

      The evidence supporting the conclusions is compelling. There is a large sample (UK biobank data) and a clear description of advanced analyses.

      Weaknesses:

      In the previous version of the paper, it was not completely clear what it means to look at the overlap between cognition and mental health. The authors have addressed this in the current version.

      Thank you for your positive feedback and for recognizing the strengths of our work. We appreciate your comments and are happy that the revisions addressed your concerns.

    1. Author response:

      eLife Assessment

      This study provides valuable mechanistic insight into the mutually exclusive distributions of the histone variant H2A.Z and DNA methylation by testing two hypotheses: (i) that DNA methylation destabilizes H2A.Z nucleosomes, thereby preventing H2A.Z retention, and (ii) that DNA methylation suppresses H2A.Z deposition by ATP-dependent chromatin remodeling complexes. Through a series of well-designed and carefully executed experiments, findings are presented in support of both hypotheses. However, the evidence in support of either hypothesis is incomplete, so that the proposed mechanisms underlying the enrichment of H2A.Z on unmethylated DNA remain somewhat speculative.

      We would like to thank the editor and reviewers for their critical assessments of our manuscript. While we do acknowledge the limitations of our work, we believe that our results provide important mechanistic insights into the long-standing question of how H2A.Z is preferentially enriched in hypomethylated genomic DNA regions. First, our structural and biochemical data suggest that DNA methylation increases the openness and physical accessibility of H2A.Z, albeit the effect is relatively subtle and is sequence-dependent. Second, using Xenopus egg extracts and synthetic DNA templates, we provide the first clear and direct evidence that DNA methylation-sensitive H2A.Z deposition is due to the H2A.Z chaperone SRCAP-C, corroborated by our discovery that SRCAP-C binding to DNA is suppressed by DNA methylation. Although the molecular details by which DNA methylation inhibits binding of SRCAP-C is an important area of future study, in our current manuscript, we do provide evidence that directly links the presence of SRCAP-C to the establishment of the DNA methylation/H2A.Z antagonism in a physiological system. Thanks to criticisms by the reviewers, we realized that we did not clearly state in our Abstract that the impact of DNA methylation on intrinsic H2A.Z nucleosome stability is relatively subtle, although we did explain these observations and limitations in the main text. In our revised manuscript, we are willing to edit the text to better clarify the criticisms raised by the reviewers.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors considered the mechanism underlying previous observations that H2A.Z is preferentially excluded from methylated DNA regions. They considered two non-mutually exclusive mechanisms. First, they tested the hypothesis that nucleosomes containing both methylated DNA and H2A.Z might be intrinsically unstable due to their structural features. Second, they explored the possibility that DNA methylation might impede SRCAP-C from efficiently depositing H2A.Z onto these DNA methylated regions.

      Their structural analyses revealed subtle differences between H2A.Z-containing nucleosomes assembled on methylated versus unmethylated DNA. To test the second hypothesis, the authors allowed H2A.Z assembly on sperm chromatin in Xenopus egg extracts and mapped both H2A.Z localization and DNA methylation in this transcriptionally inactive system. They compared these data with corresponding maps from a transcriptionally active Xenopus fibroblast cell line. This comparison confirmed the preferential deposition or enrichment of H2A.Z on unmethylated DNA regions, an effect that was much more pronounced in the fibroblast genome than in sperm chromatin. Furthermore, nucleosome assembly on methylated versus unmethylated DNA, along with SRCAP-C depletion from Xenopus egg extracts, provided a means to test whether SRCAP-C contributes to the preferential loading of H2A.Z onto unmethylated DNA.

      Strengths:

      The strength and originality of this work lie in its focused attempt to dissect the unexplained observation that H2A.Z is excluded from methylated genomic regions.

      Weaknesses:

      The study has two weaknesses. First, although the authors identify specific structural effects of DNA methylation on H2A.Z-containing nucleosomes, they do not provide evidence demonstrating that these structural differences lead to altered histone dynamics or nucleosome instability. Second, building on the elegant work of Berta and colleagues (cited in the manuscript), the authors implicate SRCAP-C in the selective deposition of H2A.Z at unmethylated regions. Yet the role of SRCAP-C appears only partial, and the study does not address how the structural or molecular consequences of DNA methylation prevent efficient H2A.Z deposition. Finally, additional plausible mechanisms beyond the two scenarios the authors considered are not investigated or discussed in the manuscript.

      Although we acknowledge the limitations of our study and are willing to expand our discussion to more thoroughly discuss these points, we believe our manuscript provides several important mechanistic insights which this reviewer may not have fully appreciated.

      Our first conclusion that H2A.Z nucleosomes on methylated DNA are more open and accessible compared to their unmethylated counterparts is supported by both our cryo-EM study and the restriction enzyme accessibility assay. Although the physical effect of DNA methylation is relatively subtle and is likely sequence dependent, as we clearly noted within the manuscript, the difference does exist and is valuable information for the chromatin field at large to consider.

      The second major conclusion of our manuscript is that SRCAP-C exhibits preferential binding to unmethylated DNA over methylated DNA, and that SRCAP-C represents the major mechanism that can explain the biased deposition of H2A.Z to unmethylated DNA in Xenopus egg extracts. Furthermore, our experiments using Xenopus egg extract clearly demonstrated that H2A.Z is deposited by both DNA-methylation sensitive and insensitive mechanisms. Depletion of SRCAP-C almost completely eliminated the levels of DNA-methylation-sensitive H2A.Z deposition and reduced the total level of H2A.Z on chromatin to less than half of that seen in non-depleted extract. This result demonstrated that DNA methylation-sensitive H2A.Z loading is primarily regulated by SRCAP-C, at least in our experimental context where transcription, replication, and other epigenetic modifications are not involved. It is likely that additional mechanisms do further contribute, implicated by our sequencing experiments, particularly at regions with active transcription, and we have noted these possibilities and the rationale for their existence in the Discussion.

      Our study also suggests that a SRCAP-independent, DNA methylation-insensitive mechanism of H2A.Z loading exists, which we suspect to be mediated by Tip60-C. In line with this possibility, our data suggest that Tip60-C binds DNA in a DNA methylation-insensitive manner in Xenopus egg extract. Since antibodies to deplete Tip60-C from Xenopus egg extract are currently unavailable, we were unable to directly test that hypothesis and decided not to include Tip60-C into our final model as we lacked experimental evidence for its role. However, whether or not Tip60-C is the complex responsible for the DNA methylation-insensitive pathway does not influence our final conclusion that SRCAP-C plays a major role in DNA methylation-sensitive H2A.Z loading. We are planning to edit our manuscript to more comprehensively discuss these points.

      Please note that while Berta et al reported that DNA methylation increases at H2A.Z loci in tumors defective in SRCAP-C, they selected those regions based off where H2A.Z is typically enriched within normal tissues (Berta et al., 2021). They did not show data indicating whether H2A.Z is still retained specifically at those analyzed loci upon mutation of SRCAP-C subunits. Thus, although we greatly admire their work and are pleased that many of our findings align with theirs, their paper did not directly address whether SRCAP-C itself differentiates between DNA methylation status nor the impact that has on H2A.Z and DNA methylation colocalization. In contrast, our Xenopus egg extract system, where de novo methylation is undetectable (Nishiyama et al., 2013; Wassing et al., 2024) offers a unique opportunity to examine the direct impact of DNA methylation on H2A.Z deposition using controlled synthetic DNA substrates. Corroborated with our demonstration that DNA binding of SRCAP-C is suppressed by DNA methylation, we believe that our manuscript provides a specific mechanism that can explain the preferential deposition of H2A.Z at hypomethylated genomic regions.

      Reviewer #2 (Public review):

      This manuscript aims to elucidate the mechanistic basis for the long-standing observation that DNA methylation and the histone variant H2A.Z occupy mutually exclusive genomic regions. The authors test two hypotheses: (i) that DNA methylation intrinsically destabilizes H2A.Z nucleosomes, thereby preventing H2A.Z retention, and (ii) that DNA methylation suppresses H2A.Z deposition by ATP-dependent chromatin-remodelling complexes. However, neither hypothesis is rigorously addressed. There are experimental caveats, issues with data interpretation, and conclusions that are not supported by the data. Substantial revision and additional experiments, including controls, would be required before mechanistic conclusions can be drawn. Major concerns are as follows:

      We appreciate the critical assessment of our manuscript by this reviewer. Although we acknowledge the limitations of our study and will revise the manuscript to better describe them, we would like to respectfully argue against the statement that our "conclusions […] are not supported by the data".

      (1) The cryo-EM structure of methylated H2A.Z nucleosomes is insufficiently resolved to address the central mechanistic question: where the methylated CpGs are located relative to DNA-histone contact points and how these modifications influence H2A.Z nucleosome structure. The structure provides no mechanistic insights into methylation-induced destabilization.

      The fact that the DNA resolution in the methylated structure was not high enough to resolve the positions of methylated CpGs despite a high overall resolution of 2.78 Å implies that 1) the Sat2R-P DNA was not as stably registered as the 601L sequence, requiring us to create two alternative Sat2R-P atomic models to account for the variable positioning in our samples, and 2) that the presence of DNA methylation increases that positional variability. We understand that one may prefer to see highly resolved density around each methylation mark, but we do believe that our inability to accomplish that is actually a feature rather than a weakness and has important biological implications. The decrease in local DNA resolution on the methylated Sat2R-P structure compared to its unmethylated counterpart is meaningful and suggests to us that DNA methylation weakens overall DNA wrapping and positioning on the nucleosome, supported by the increased flexibility seen at the linker DNA ends as well as an increase in the population of highly shifted nucleosomes amongst the methylated particles. Additionally, one major view in the DNA methylation/nucleosome stability field is that the presence of DNA methylation can make DNA stiffer and harder to bend, causing opening and destabilization of nucleosomes (Ngo et al., 2016). The increased opening of linker DNA ends and accessibility of methylated H2A.Z nucleosomes in our hands also aligns with such an idea, again suggesting decreased histone-DNA contact stability on methylated DNA substrates. We plan to revise the writing in our manuscript to better reflect these ideas.

      The experimental system also lacks physiological relevance. The template DNA sequence is artificial, despite the existence of well-characterised native genomic sequences for which DNA methylation is known to inhibit H2A.Z incorporation. Alternatively, there are a number of studies examining the effect of DNA methylation on nucleosome structure, stability, DNA unwrapping, and positioning. Choosing one of these DNA sequences would have at least allowed a direct comparison with a canonical nucleosome. Indeed, a major omission is the absence of a cryo-EM structure of a canonical nucleosome assembled on the same DNA template - this is essential to assess whether the observed effects are H2A.Z-specific.

      The reviewer raises a fair question about whether canonical H2A would experience the same DNA methylation-dependent structural effects. We had considered solving the H2A structures, however, ultimately decided against it for a few reasons. First, there already exists crystal structures of canonical H2A nucleosomes using a DNA sequence highly similar to our Sat2R-P with and without the presence of DNA methylation (PDB: 5CPI and 5CPJ). The authors of this study did not see any physical differences present in their structures (Osakabe et al., 2015). Additionally, we had included canonical H2A conditions within our restriction enzyme accessibility assay and did not see a significant impact of DNA methylation on those samples (Fig 3). Because of the previous report and our own negative data, we expected that only limited additional insights would be obtained from the canonical H2A structures and decided not to pursue that analysis.

      One of the primary reasons we chose the Sat2R-P sequence was, as noted above, that there already was a published study examining how DNA methylation affects nucleosome structure using a variant of this sequence which we could compare to our results, as the reviewer has suggested. We did have to modify the sequence, namely by making it palindromic, in order to increase the final achievable resolution. We viewed the Sat2R-P sequence as an attractive candidate because it is physiologically relevant; the initial sequence was taken directly from human satellite II. Several modifications were made for technical reasons, including making the sequence palindromic as described above and also ensuring that each CpG is recognizable by a methylation-sensitive restriction enzyme so that we could be certain about the degree of methylation on our substrates. These practical concerns outweighed the necessity of maintaining a strict physiological sequence to us. However, we still believe the final Sat2R-P more closely mimics physiological sequences than Widom 601. Additionally, human satellite II is a highly abundant sequence in the human genome that is known to undergo large methylation changes on the onset of many disorders, like cancer, as well as during aging. Thus, there are interesting biological questions surrounding how the methylation state of this particular sequence affects chromatin structure. Furthermore, it has been reported that satellite II is devoid of H2A.Z (Capurso et al., 2012). Beyond those reasons, the satellite II sequence is generally interesting to our lab because we have been studying genes involved in ICF syndrome, where hypomethylation of satellite II sequences forms one of the hallmarks of this disorder (Funabiki et al., 2023; Jenness et al., 2018; Wassing et al., 2024). We understand that sequence context plays a large role in nucleosome wrapping and stability. This is why we strived to test multiple sequences in each of our assays. We do agree that it would be interesting to use DNA sequences where H2A.Z binding has already been described to be affected in a DNA methylation-dependent manner, forming an exciting future study to pursue.

      Furthermore, the DNA template is methylated at numerous random CpG sites. The authors' argument that only the global methylation level is relevant is inconsistent with the literature, which clearly demonstrates that methylation effects on canonical nucleosomes are position-dependent. Not all CpG sites contribute equally to nucleosome stability or unwrapping, and this critical factor is not considered.

      We did not argue that only the global methylation level is relevant. We also would appreciate it if the reviewer could provide specific references that "clearly demonstrates that methylation effects on canonical nucleosomes are position-dependent". We are aware of a series of studies conducted by Chongli Yuan's group, including one testing the effect of placing methylated CpGs at different positions along the Widom 601 sequence. In that study (Jimenez-Useche et al., 2013), they did find that positioning of mCpGs has differential impacts on the salt resistance of the nucleosomes, with 5 tandem mCpG copies at the dyad causing the most dramatic nucleosome opening whereas having mCpGs only at the DNA major grooves, but not elsewhere, increased nucleosome stability. However, they did also find that methylation of the original Widom 601 sequence also caused destabilization, albeit to a lesser degree, and another study by the same group (Jimenez-Useche et al., 2014) also found that CpG methylation decreased nucleosome-forming ability for all tested variants of the Widom 601 sequence, regardless of CpG density or positioning.

      Other studies monitored how distribution of methylated CpGs correlates with nucleosome positioning (Collings et al., 2013; Davey et al., 1997; Davey et al., 2004). However, these studies assessed the sequence-dependent effects specifically on nucleosome assembly during in vitro salt dialysis, which is a different physical process than the one our manuscript focuses on, especially when considering the fact that H2A.Z is deposited onto preassembled H2A-nucleosome. Our cryo-EM analysis examines the structural changes induced by DNA methylation on already formed nucleosomes rather than the process of formation. Thus, probing accessibility changes using a restriction enzyme was the more appropriate biochemical assay to verify our structures.

      We do very much agree that DNA context can influence nucleosome stability under different conditions. A study of molecular dynamics simulations concluded that the "combination of overall DNA geometrical and shape properties upon methylation" makes nucleosomes resistant to unwrapping (Li et al., 2022), while another modeling study suggests that DNA methylation impacts nucleosome stability in a manner dependent on DNA sequence, where "[s]trong binding is weakened and weak binding is strengthened" (Minary and Levitt, 2014). While G/C-dinucleotides are preferentially placed at major groove-inward positions in the nucleosomes in vivo (Chodavarapu et al., 2010; Segal et al., 2006) and G/C-rich segments are excluded from major groove-outward positions in Widom 601-like nucleosomes (Chua et al., 2012), methylated CpG dinucleotides are preferably, if not exclusively, located at major groove-outward positions in vivo. Mechanisms behind this biased mCpG positioning on the nucleosome remain speculative, likely caused by a combination of multiple factors, but the fact that we did not observe clear structural impacts using the Widom 601L sequence, where mCpGs are located at the major groove-outward and -inward positions ((Chua et al., 2012) and our structure), deserves a space for discussion. On the other hand, positioning of mCpG on satellite II-derived sequences that we used in this study was based on a physiological sequence, and thus it may not be appropriate to say that those CpGs are placed at multiple "random" positions. Although we decided not to discuss the position of 5mC on our Sat2R nucleosome structure due to ambiguous base assignments, neither of our two atomic models is consistent with an idea that DNA methylation repositions the CpG to the outward major grooves. As the potential contribution of how DNA methylation affects the nucleosome structure via modulating DNA stiffness has been extensively studied (Choy et al., 2010; Li et al., 2022; Ngo et al., 2016; Perez et al., 2012), we believe that it is appropriate to consider overall DNA properties along the whole DNA sequence, though we are willing to discuss potential positional effects in the revised manuscript.

      Perhaps one of the most important points that we did not emphasize enough in our original manuscript was that in contrast to the subtle intrinsic effect of DNA methylation that was DNA sequence dependent, we observed SRCAP-dependent preferential H2A.Z deposition to unmethylated DNA over methylated DNA in both 601 and satellite II DNAs. In the revised manuscript, we will make the value of comparative studies on 601 and satellite II in two distinct mechanisms.

      Finally, and most importantly, the reported increase in accessibility of the methylated H2A.Z nucleosome is negligible compared with the much larger intrinsic DNA accessibility of the unmethylated H2A.Z nucleosome. These data do not support the authors' hypothesis and contradict the manuscript's conclusions. Claims that methylated H2A.Z nucleosomes are "more open and accessible" must therefore be removed, and the title is misleading, given that no meaningful impact of DNA methylation on H2A.Z nucleosome stability is demonstrated.

      We respectfully disagree with this reviewer's criticism. We investigated the potential impact of DNA methylation on nucleosome stability to the best of our abilities through complementary assays and reported our observations. The effect of DNA methylation is smaller than the difference between H2A.Z and H2A, but we were able to see an effect. It is also not uncommon for small differences to have functional impacts in biological systems. We agree that further testing is required to determine whether this subtle effect is functionally important, and it remains the subject of future research due to the many technical challenges associated with addressing said question. We would like to note that 18 years have passed since Daniel Zilberman first reported the antagonistic relationship between H2AZ and DNA methylation (Zilberman et al., 2008) but very few studies have since directly tested specific mechanistic hypotheses. We believe that our study lays the groundwork for exciting future investigation that better elucidates the pathways that contribute to this antagonism and will have meaningful impacts on the field in general. However, thanks to the reviewer's criticism, we realized that we did not clearly state in the Abstract the relatively subtle effect of DNA methylation on the intrinsic H2A.Z nucleosome stability. Therefore, we will accordingly revise the Abstract to make this point clearer.

      (2) The cryo-EM structures of methylated and unmethylated 601L H2A.Z nucleosomes show no detectable differences. As presented, this negative result adds little value. If anything, it reinforces the point that the positional context of CpG methylation is critical, which the manuscript does not consider.

      We believe the inclusion and factual reporting of negative data is important for the scientific community as one of the major issues currently in biology research is biased omission of negative data. We considered eLife as a venue to publish this work for this reason. We understand that the reviewer believes our 601L structures may detract from the overall message of our manuscript. We believe this data rather emphasizes the importance of DNA sequence context, something that the reviewer also rightfully notes. It is standard practice in the nucleosome field to use the Widom 601 sequence, along with its variants. Our experience has shown that use of an artificially strong positioning sequence may mask weaker physical effects that could play a physiological role. Thus, we were careful to validate all further assays with multiple DNA sequences and believed it important to report these sequence-dependent effects on nucleosome structure.

      (3) Very little H3 signal coincides with H2A.Z at TSSs in sperm pronuclei, yet this is neither explained nor discussed (Supplementary Figure 10D). The authors need to clarify this.

      Our H3 signal, which represents the global nucleosome population, is more broadly distributed across the genome than H2A.Z, which is known to localize at specific genomic sites. Since both histone types were sequenced to similar read depths, H3 peaks are generally shallower than H2A.Z and peak heights cannot be directly compared (i.e. they should be represented in separate appropriate data ranges).

      (4) In my view, the most conceptually important finding is that H2A.Z-associated reads in sperm pronuclei show ~43% CpG methylation. This directly contradicts the model of strict mutual exclusivity and suggests that the antagonism is context-dependent. Similarly, the finding that the depletion of SRCAP reduces H2A.Z deposition only on unmethylated templates is also very intriguing. Collectively, these result warrants further investigation (see below).

      (5) Given that H2A.Z is located at diverse genomic elements (e.g., enhancers, repressed gene bodies, promoters), the manuscript requires a more rigorous genomic annotation comparing H2A.Z occupancy in sperm pronuclei versus XTC-2 cells. The authors should stratify H2A.Z-DNA methylation relationships across promoters, 5′UTRs, exons, gene bodies, enhancers, etc., as described in Supplementary Figure 10A.

      (below is response to (4) and (5) together)

      We agree that the substantial presence of co-localized H2A.Z and DNA methylation specifically in the sperm pronuclei samples and the changes in pattern between nuclear types are highly interesting and require further investigation. However, we faced technical challenges in our sequencing experiments that made us refrain from conducting a more detailed analysis for fear of over-interpreting potential artifacts. These challenges mainly stemmed from the difficulties in collecting enough material from Xenopus egg extracts and Tn5’s innate bias towards accessible regions of the genome. Because of this, open regions of the genome tend to be overrepresented in our data (as noted in our Discussion), making it challenging to rigorously compare methylation profiles and H2A.Z/H3 associated genomic elements.

      While the degree of separation seems to be dependent on nuclei type, we still believe the antagonism exists in both the sperm pronuclei and XTC-2 samples when comparing H2A.Z methylation profiles to the corresponding H3 condition. Our study also demonstrates that H2A.Z is preferentially deposited to hypomethylated DNA in a manner dependent of SRCAP-C (the loss of SRCAP only reduces H2A.Z on unmethylated substrates) but an additional methylation-insensitive H2A.Z deposition mechanism also exists. We realized that this interesting point was not clearly highlighted in Abstract, so we will revise it accordingly.

      (6) Although H2A.Z accumulates less efficiently on exogenous methylated substrates in egg extract, substantial deposition still occurs (~50%). This observation directly challenges the strong antagonistic model described in the manuscript, yet the authors do not acknowledge or discuss it. Moreover, differences between unmethylated and methylated 601 DNA raise further questions about the biological relevance of the cryo-EM 601 structures.

      As depicted in Figure 6 and described in the Discussion, we clearly indicated that both methylation-sensitive and methylation-insensitive pathways exist to deposit H2A.Z within the genome. We also directly stated in our Discussion that a substantial proportion of H2A.Z colocalizes with DNA methylation both in our study as well as in previous reports, which is of major interest for future study. Additionally, we further discussed how the absence of transcription in Xenopus eggs is a likely reason for the more limited effect of DNA methylation restricting H2A.Z deposition in our egg extract system.

      As noted in our response to (2), the lack of a clear impact on our 601L structures implies that this is due to the extraordinarily strong artificial nucleosome positioning capacity of the 601 sequence and its variants. Since 601 is heavily used in chromatin biology, including within DNA methylation research, such negative data are still useful to include and publish.

      (7) The SRCAP depletion is insufficiently validated i.e., the antibody-mediated depletion of SRCAP lacks quantitative verification. A minimum of three biological replicates with quantification is required to substantiate the claims.

      We are willing to address this concern. However, please note that our data showed that methylation-dependent H2A.Z deposition is almost completely erased upon SRCAP depletion, indicating functionally effective depletion. The specificity of the custom antibody against Xenopus SRCAP was verified by mass spectrometry. Additionally, we have obtained the same effect using another commercially available SRCAP antibody, though we did not include this preliminary result in our original manuscript. Due to its relatively low abundance and high molecular weight, SRCAP western blot signals are weak, making it challenging to quantify the degree of depletion. We also believe that the value of quantification in this context, with the points noted above, is rather limited. In the past, our lab has published papers on depleting the H3T3 kinase Haspin from Xenopus egg extracts (Ghenoiu et al., 2013; Kelly et al., 2010) but were never able to detect Haspin via western blot. This protein was only detected by mass spectrometry specifically on nucleosome array beads with H3K9me3 (Jenness et al., 2018). However, depletion of Haspin was readily monitored by erasure of H3T3ph, the enzymatic product of Haspin. In these experiments, it was impossible, and not critical, to quantitatively monitor the depletion of Haspin protein in order to investigate its molecular functions. Similarly, in this current study, the important fact is that depletion of SRCAP suppressed methylation-sensitive H2A.Z deposition and quantifying the degree of SRCAP depletion would not have a major impact on this conclusion.

      (8) It appears that the role of p400-Tip60 has been completely overlooked. This complex is the second major H2A.Z deposition complex. Because p400 exhibits DNA methylation-insensitive binding (Supplementary Figure 14), it may account for the deposition of H2A.Z onto methylated DNA. This possibility is highly significant and must be addressed by repeating the key experiments in Figure 5 following p400-Tip60 depletion.

      We are aware that the Tip60 complex is a very likely candidate for mediating DNA methylation-insensitive H2A.Z deposition, which is why we tested whether DNA binding of p400 is methylation sensitive. Therefore, the reviewer's statement that we "completely overlooked" Tip60-C’s role does not fairly report on our efforts. We wished to test the potential contribution of Tip60-C, but, unfortunately, the antibodies we currently have available to us were not successful in depleting the complex from egg extract. Since we had no direct experimental evidence indicating the role Tip60-C plays, we decided to take a conservative approach to our model and leave the methylation-insensitive pathway as mediated by something still unidentified. While further investigating Tip60-C’s contribution to this pathway is of definite value, we do not believe that it impacts our major conclusion that SRCAP-C is the main mediator responsible for H2A.Z deposition on unmethylated DNA and thus remains a subject for future study.

      (9) The manuscript repeatedly states that H2A.Z nucleosomes are intrinsically unstable; however, this is an oversimplification. Although some DNA unwrapping is observed, multiple studies show that H3/H4 tetramer-H2A.Z/H2B interactions are more stable (important recent studies include the following: DOI: 10.1038/s41594-021-00589-3; 10.1038/s41467-021-22688-x; and reviewed in 10.1038/s41576-024-00759-1).

      We understand that the H2A.Z stability field is highly controversial. We have introduced the many conflicting reports that have been published in the field but can further expand on the controversies if desired. We also understand that the term “nucleosome stability” is broad and encompasses many physical aspects. As noted in a prior response, we will better specify our use of the term within the manuscript. In our assays, we are most focused on the DNA wrapping stability of the nucleosome and have consistently seen in our hands that H2A.Z nucleosomes are much more open and accessible compared to canonical H2A on satellite II-derived sequences, regardless of methylation status. However, we do understand that many groups have observed the opposite findings while others have obtained results similar to us. We reported on our findings of the general H2A.Z stability with the hopes to help clarify some of the field’s controversies.

      In summary, the current manuscript does not present a convincing mechanistic explanation for the antagonism between DNA methylation and H2A.Z. The observation that H2A.Z can substantially coexist with DNA methylation in sperm pronuclei, perhaps, should be the conceptual focus.

      We appreciate this reviewer’s advice. However, please note that the first author who led this project has already successfully defended their PhD thesis primarily based on this project, making it impractical and unrealistic to completely change the focus of this manuscript to include an entirely new avenue of research. We believe that our data provide important insights into the mechanisms by which H2A.Z is excluded from methylated DNA, particularly via the DNA methylation-sensitive binding of SRCAP-C, which has never been described before. We agree that many questions are still left unanswered, including the exact molecular mechanism behind how DNA methylation prevents SRCAP-C binding. We have preliminary data that suggest none of the known DNA-binding modules of SRCAP-C, including ZNHIT1, by themselves can explain this sensitivity. This implies that domain dissection in the context of the holo-SRCAP complex is required to fully address this question. We believe this represents a very exciting future avenue of study; however, it does not negate our finding that SRCAP-C itself is important for maintaining the DNA methylation/H2A.Z antagonism. Therefore, we respectfully disagree with this reviewer's summary statement, which misleadingly undermines the impact of our work.

      Reviewer #3 (Public review):

      Summary:

      Histone variant H2A.Z is evolutionarily conserved among various species. The selective incorporation and removal of histone variants on the genome play crucial roles in regulating nuclear events, including transcription. Shih et al. aimed to address antagonistic mechanisms between histone variant H2A.Z deposition and DNA methylation. To this end, the authors reconstituted H2A.Z nucleosomes in vitro using methylated or unmethylated human satellite II DNA sequence and examined how DNA methylation affects H2A.Z nucleosome structure and dynamics. The cryo-EM analysis revealed that DNA methylation induces a more open conformation in H2A.Z nucleosomes. Consistent with this, their biochemical assays showed that DNA methylation subtly increases restriction enzyme accessibility in H2A.Z nucleosomes compared with canonical H2A nucleosomes. The authors identified genome-wide profiles of H2A.Z and DNA methylation using genomic assays and found their unique distribution between Xenopus sperm pronuclei and fibroblast cells. Using Xenopus egg extract systems, the authors showed SRCAP complex, the chromatin remodelers for H2A.Z deposition, preferentially deposit H2A.Z on unmethylated DNA.

      Strengths:

      The study is solid, and most conclusions are well-supported. The experiments are rigorously performed, and interpretations are clear. The study presents a high-resolution cryo-EM structure of human H2A.Z nucleosome with methylated DNA. The discovery that the SRCAP complex senses DNA methylation is novel and provides important mechanistic insight into the antagonism between H2A.Z and DNA methylation.

      We are grateful that this reviewer recognizes the importance of our study.

      Weaknesses:

      The study is already strong, and most conclusions are well supported. However, it can be further strengthened in several ways.

      (1) It is difficult to interpret how DNA methylation alters the orientation of the H4 tail and leads to the additional density on the acidic patch. The data do not convincingly support whether DNA methylation enhances interactions with H2A.Z mono-nucleosomes, nor whether this effect is specific to methylated H2A.Z nucleosomes.

      The altered H4 tail orientation and extra density seen on the acidic patch were incidental findings that we thought could be interesting for the field to be aware of but decided not to follow up on as there were other structural differences that were more directly related to our central question. We do believe that the above two differences are linked to each other because we used a highly purified and homogenous sample for cryo-EM analysis and the H4 tail/acidic patch interaction is a well characterized contact that mediates inter-nucleosome interactions. Additionally, other groups have reported that the presence of DNA methylation causes condensation of both chromatin and bare DNA (cited within our manuscript), though the mechanics behind this phenomenon remain to be elucidated. We believed that our structure data may also align with those findings. However, the reviewer is fair in pointing out that we do not provide further experimental evidence in verifying the existence of these increased interactions. We can revise our writing to clarify that these points are currently hypotheses rather than validated results.

      (2) It remains unclear whether DNA methylation alters global H2A.Z nucleosome stability or primarily affects local DNA end flexibility. Moreover, while the authors showed locus-specific accessibility by HinfI digestion, an unbiased assay such as MNase digestion would strengthen the conclusions.

      We would like to thank the reviewer for bringing up these issues. Although our current data cannot explicitly clarify these possibilities, we favor an idea that DNA methylation specifically alters histone to DNA contacts and that this effect is felt globally across the entire nucleosome rather than only at specific locations. The intrinsic flexibility of linker DNA ends means that that region tends to exhibit the greatest differences under different physical influences, hence the focus on characterizing that area; flexibility of a thread on a spool is most pronounced at the ends. However, we also found that the DNA backbone of H2A.Z on methylated DNA had a lower local resolution compared to its unmethylated counterpart, despite that structure having a higher global resolution, which suggested to us that DNA positioning along the nucleosome is overall weaker under the presence of DNA methylation. This is corroborated by the increased population of open/shifted structures in our classification analysis. The reviewer raises a fair point about the use of a specific restriction enzyme versus MNase. We agree that our accessibility assay is highly influenced by the position of the restriction site and have previously seen that moving the cut site too close to the linker DNA end will abolish any DNA methylation-dependent differences. We did initially attempt an MNase digestion-based assay, but the data were not as reproducible as with the use of a specific restriction enzyme. We do not know the reason behind this irreproducibility though we believe that the processivity of MNase could make it difficult to capture subtle effects like those induced by DNA methylation on already highly accessible H2A.Z nucleosomes. Overall, while we believe that DNA methylation does exert a physical effect, its subtlety may explain the many contradictory studies present within the DNA methylation and nucleosome stability field.

      References

      Berta, D.G., H. Kuisma, N. Valimaki, M. Raisanen, M. Jantti, A. Pasanen, A. Karhu, J. Kaukomaa, A. Taira, T. Cajuso, S. Nieminen, R.M. Penttinen, S. Ahonen, R. Lehtonen, M. Mehine, P. Vahteristo, J. Jalkanen, B. Sahu, J. Ravantti, N. Makinen, K. Rajamaki, K. Palin, J. Taipale, O. Heikinheimo, R. Butzow, E. Kaasinen, and L.A. Aaltonen. 2021. Deficient H2A.Z deposition is associated with genesis of uterine leiomyoma. Nature. 596:398–403.

      Capurso, D., H. Xiong, and M.R. Segal. 2012. A histone arginine methylation localizes to nucleosomes in satellite II and III DNA sequences in the human genome. BMC Genomics. 13:630.

      Chodavarapu, R.K., S. Feng, Y.V. Bernatavichute, P.Y. Chen, H. Stroud, Y. Yu, J.A. Hetzel, F. Kuo, J. Kim, S.J. Cokus, D. Casero, M. Bernal, P. Huijser, A.T. Clark, U. Kramer, S.S. Merchant, X. Zhang, S.E. Jacobsen, and M. Pellegrini. 2010. Relationship between nucleosome positioning and DNA methylation. Nature. 466:388–392.

      Choy, J.S., S. Wei, J.Y. Lee, S. Tan, S. Chu, and T.H. Lee. 2010. DNA methylation increases nucleosome compaction and rigidity. J Am Chem Soc. 132:1782–1783.

      Chua, E.Y., D. Vasudevan, G.E. Davey, B. Wu, and C.A. Davey. 2012. The mechanics behind DNA sequence-dependent properties of the nucleosome. Nucleic Acids Res. 40:6338–6352.

      Collings, C.K., P.J. Waddell, and J.N. Anderson. 2013. Effects of DNA methylation on nucleosome stability. Nucleic Acids Res. 41:2918–2931.

      Davey, C., S. Pennings, and J. Allan. 1997. CpG methylation remodels chromatin structure in vitro. J Mol Biol. 267:276–288.

      Davey, C.S., S. Pennings, C. Reilly, R.R. Meehan, and J. Allan. 2004. A determining influence for CpG dinucleotides on nucleosome positioning in vitro. Nucleic Acids Res. 32:4322–4331.

      Funabiki, H., I.E. Wassing, Q. Jia, J.D. Luo, and T. Carroll. 2023. Coevolution of the CDCA7-HELLS ICF-related nucleosome remodeling complex and DNA methyltransferases. Elife. 12.

      Ghenoiu, C., M.S. Wheelock, and H. Funabiki. 2013. Autoinhibition and polo-dependent multisite phosphorylation restrict activity of the histone h3 kinase haspin to mitosis. Mol Cell. 52:734–745.

      Jenness, C., S. Giunta, M.M. Muller, H. Kimura, T.W. Muir, and H. Funabiki. 2018. HELLS and CDCA7 comprise a bipartite nucleosome remodeling complex defective in ICF syndrome. Proc Natl Acad Sci U S A. 115:E876–E885.

      Jimenez-Useche, I., J. Ke, Y. Tian, D. Shim, S.C. Howell, X. Qiu, and C. Yuan. 2013. DNA methylation regulated nucleosome dynamics. Sci Rep. 3:2121.

      Jimenez-Useche, I., D. Shim, J. Yu, and C. Yuan. 2014. Unmethylated and methylated CpG dinucleotides distinctively regulate the physical properties of DNA. Biopolymers. 101:517–524.

      Kelly, A.E., C. Ghenoiu, J.Z. Xue, C. Zierhut, H. Kimura, and H. Funabiki. 2010. Survivin reads phosphorylated histone H3 threonine 3 to activate the mitotic kinase Aurora B. Science. 330:235–239.

      Li, S., Y. Peng, D. Landsman, and A.R. Panchenko. 2022. DNA methylation cues in nucleosome geometry, stability and unwrapping. Nucleic Acids Res. 50:1864–1874.

      Minary, P., and M. Levitt. 2014. Training-free atomistic prediction of nucleosome occupancy. Proc Natl Acad Sci U S A. 111:6293–6298.

      Ngo, T.T., J. Yoo, Q. Dai, Q. Zhang, C. He, A. Aksimentiev, and T. Ha. 2016. Effects of cytosine modifications on DNA flexibility and nucleosome mechanical stability. Nat Commun. 7:10813.

      Nishiyama, A., L. Yamaguchi, J. Sharif, Y. Johmura, T. Kawamura, K. Nakanishi, S. Shimamura, K. Arita, T. Kodama, F. Ishikawa, H. Koseki, and M. Nakanishi. 2013. Uhrf1-dependent H3K23 ubiquitylation couples maintenance DNA methylation and replication. Nature. 502:249–253.

      Osakabe, A., F. Adachi, Y. Arimura, K. Maehara, Y. Ohkawa, and H. Kurumizaka. 2015. Influence of DNA methylation on positioning and DNA flexibility of nucleosomes with pericentric satellite DNA. Open Biol. 5.

      Perez, A., C.L. Castellazzi, F. Battistini, K. Collinet, O. Flores, O. Deniz, M.L. Ruiz, D. Torrents, R. Eritja, M. Soler-Lopez, and M. Orozco. 2012. Impact of methylation on the physical properties of DNA. Biophys J. 102:2140–2148.

      Segal, E., Y. Fondufe-Mittendorf, L. Chen, A. Thastrom, Y. Field, I.K. Moore, J.P. Wang, and J. Widom. 2006. A genomic code for nucleosome positioning. Nature. 442:772–778.

      Wassing, I.E., A. Nishiyama, R. Shikimachi, Q. Jia, A. Kikuchi, M. Hiruta, K. Sugimura, X. Hong, Y. Chiba, J. Peng, C. Jenness, M. Nakanishi, L. Zhao, K. Arita, and H. Funabiki. 2024. CDCA7 is an evolutionarily conserved hemimethylated DNA sensor in eukaryotes. Sci Adv. 10:eadp5753.

      Zilberman, D., D. Coleman-Derr, T. Ballinger, and S. Henikoff. 2008. Histone H2A.Z and DNA methylation are mutually antagonistic chromatin marks. Nature. 456:125–129.

    1. Reviewer #3 (Public review):

      Summary:

      Bogdan et al. present an intriguing investigation into the spontaneous dynamics of prediction error (PE)-related brain states. Using two independent fMRI tasks designed to elicit prediction and prediction error in separate participant samples, alongside both fMRI and EEG data, the authors identify convergent brain network patterns associated with high versus low PE. Notably, they further show that similar patterns can be detected during resting-state fMRI, suggesting that PE-related neural states may recur outside of explicit task demands.

      Strengths:

      The authors use a well-integrated analytic framework that combines multiple prediction tasks and brain imaging modalities. The inclusion of several datasets probing PE under different contexts strengthens the claim of generalizability across tasks and samples. The open sharing of code and data is commendable and will be valuable for future work seeking to build on this framework.

      Weaknesses:

      A central challenge of the manuscript lies in interpreting the functional significance of PE-related brain network states during rest. Demonstrating that a task-defined cognitive state recurs spontaneously is intriguing, but without clear links to behavior, individual traits, or experiential content during rest, it remains difficult to interpret what such spontaneous brain states tell us about the mind and brain. For example, it is unclear whether these states support future inference or learning, reflect offline predictive processing, or instead suggest state reinstatement due to a more general form of neural plasticity and circuit dynamics in the brain. Demonstrating any one of these downstream relationships would be valuable since it has the potential to inform our understanding of cognitive function or more general principles of neural organization.

      I appreciate the authors' position that establishing the existence of such states is a necessary first step, and that future work may clarify their behavioral relevance. However, the current form makes it challenging to assess the conceptual advance of the present work in isolation.

      Relatedly, in my previous review I raised questions about both across- and within-individual variability-for example, whether individuals who exhibit stronger or more distinct PE-related fluctuations at rest also show superior performance on prediction-related tasks (across-individual), or whether momentary increases in PE-network expression during tasks relate to faster or more accurate prediction (within-individual). The authors thoughtfully addressed this suggestion by conducting an individual-differences analysis correlating each participant's fluctuation amplitude with approximately 200 behavioral and trait measures from the HCP dataset.

      The reported findings-a negative association with age and card-sorting performance, alongside a positive association with age-adjusted picture sequence memory-are interesting but difficult to interpret within a coherent functional framework. As presented, these results do not clearly support the idea that spontaneous PE-state fluctuations are related to enhancement in prediction, inference, or broader cognitive function. Instead, they raise the possibility that fluctuation amplitude may reflect more general factors (e.g., age) rather than a functionally meaningful PE-related process.

      Overall, while the methodological contribution is strong, the manuscript would benefit from a clearer articulation of what functional conclusions can or cannot be drawn from the presence of spontaneous PE-related states, as well as a more cautious framing of their potential cognitive significance.

      Further comments:

      I appreciate that the authors took my earlier suggestions seriously and incorporated additional analyses examining behavioral relevance and permutation tests in the revision.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The Reviewer structured their review such that their first two recommendations specifically concerned the two major weaknesses they viewed in the initial submission. For clarity and concision, we have copied their recommendations to be placed immediately following their corresponding points on weaknesses.

      Strengths:

      Studying prediction error from the lens of network connectivity provides new insights into predictive coding frameworks. The combination of various independent datasets to tackle the question adds strength, including two well-powered fMRI task datasets, resting-state fMRI interpreted in relation to behavioral measures, as well as EEG-fMRI.

      Weaknesses:

      Major:

      (R1.1) Lack of multiple comparisons correction for edge-wise contrast:

      The analysis of connectivity differences across three levels of prediction error was conducted separately for approximately 22,000 edges (derived from 210 regions), yet no correction for multiple comparisons appears to have been applied. Then, modularity was applied to the top 5% of these edges. I do not believe that this approach is viable without correction. It does not help that a completely separate approach using SVMs was FDR-corrected for 210 regions.

      [Later recommendation] Regarding the first major point: To address the issue of multiple comparisons in the edge-wise connectivity analysis, I recommend using the Network-Based Statistic (NBS; Zalesky et al., 2010). NBS is well-suited for identifying clusters (analogous to modules) of edges that show statistically significant differences across the three prediction error levels, while appropriately correcting for multiple comparisons.

      Thank you for bringing this up. We acknowledge that our modularity analysis does not evaluate statistical significance. Originally, the modularity analysis was meant to provide a connectome-wide summary of the connectivity effects, whereas the classification-based analysis was meant to address the need for statistical significance testing. However, as the reviewer points out, it would be better if significance were tested in a manner more analogous to the reported modules. As they suggest, we updated the Supplemental Materials (SM) to include the results of Network-Based Statistic analysis (SM p. 1-2):

      “(2.1) Network-Based Statistic

      Here, we evaluate whether PE significantly impacts connectivity at the network level using the Network-Based Statistic (NBS) approach.[1] NBS relied on the same regression data generated for the main-text analysis, whereby a regression is performed examining the effect of PE (Low = –1, Medium = 0, High = +1) on connectivity for each edge. This was done across the connectome, and for each edge, a z-score was computed. For NBS, we thresholded edges to |Z| > 3.0, which yielded one large network cluster, shown in Figure S3. The size of the cluster – i.e., number of edges – was significant (p < .05) per a permutation-test using 1,000 random shuffles of the condition data for each participant, as is standard.[1] These results demonstrate that the networklevel effects of PE on connectivity are significant. The main-text modularity analysis converts this large cluster into four modules, which are more interpretable and open the door to further analyses”.

      We updated the Results to mention these findings before describing the modularity analysis (p. 8-9):

      “After demonstrating that PE significantly influences brain-wide connectivity using Network-Based Statistic analysis (Supplemental Materials 2.1), we conducted a modularity analysis to study how specific groups of edges are all sensitive to high/low-PE information.”

      (R1.2) Lack of spatial information in EEG:

      The EEG data were not source-localized, and no connectivity analysis was performed. Instead, power fluctuations were averaged across a predefined set of electrodes based on a single prior study (reference 27), as well as across a broader set of electrodes. While the study correlates these EEG power fluctuations with fMRI network connectivity over time, such temporal correlations do not establish that the EEG oscillations originate from the corresponding network regions. For instance, the observed fronto-central theta power increases could plausibly originate from the dorsal anterior cingulate cortex (dACC), as consistently reported in the literature, rather than from a distributed network. The spatially agnostic nature of the EEG-fMRI correlation approach used here does not support interpretations tied to specific dorsal-ventral or anterior-posterior networks. Nonetheless, such interpretations are made throughout the manuscript, which overextends the conclusions that can be drawn from the data.

      [Later recommendation] Regarding the second major point: I suggest either adopting a source-localized EEG approach to assess electrophysiological connectivity or revising all related sections to avoid implying spatial specificity or direct correspondence with fMRI-derived networks. The current approach, which relies on electrode-level power fluctuations, does not support claims about the spatial origin of EEG signals or their alignment with specific connectivity networks.

      We thank the reviewer for this important point, which allows us to clarify the specific and distinct contributions of each imaging modality in our study. Our primary goal for Study 3 was to leverage the high temporal resolution of EEG to identify the characteristic frequency at which the fMRI-defined global connectivity states fluctuate. The study was not designed to infer the spatial origin of these EEG signals, a task for which fMRI is better suited and which we addressed in Studies 1 and 2.

      As the reviewer points out, fronto-central theta is generally associated with the dACC. We agree with this point entirely. We suspect that there is some process linking dACC activation to the identified network fluctuations – some type of relationship that does not manifest in our dynamic functional connectivity analyses – although this is only a hypothesis and one that is beyond the present scope.

      We updated the Discussion to mention these points and acknowledge the ambiguity regarding the correlation between network fluctuation amplitude (fMRI) and Delta/Theta power (EEG) (p. 24):

      “We specifically interpret the fMRI-EEG correlation as reflecting fluctuation speed because we correlated EEG oscillatory power with the fluctuation amplitude computed from fMRI data. Simply correlating EEG power with the average connectivity or the signed difference between posterior-anterior and ventral-dorsal connectivity yields null results (Supplemental Materials 6), suggesting that this is a very particular association, and viewing it as capturing fluctuation amplitude provides a parsimonious explanation. Yet, this correlation may be interpreted in other ways. For example, resting-state Theta is also a signature of drowsiness,[2] which may correlate with PE processing, but perhaps should be understood as some other mechanism. Additionally, Theta is widely seen as a sign of dorsal anterior cingulate cortex activity,3 and it is unclear how to reconcile this with our claims about network fluctuations. Nonetheless, as we show with simulations (Supplemental Materials 5), a correlation between slow fMRI network fluctuations and fast EEG Delta/Theta oscillations is also consistent with a common global neural process oscillating rapidly and eliciting both measures.”

      Regarding source-localization, several papers have described known limitations of this strategy for drawing precise anatomical inferences,[4–6] and this seems unnecessary given that our fMRI analyses already provide more robust anatomical precision. We intentionally used EEG in our study for what it measures most robustly: millisecond-level temporal dynamics.

      (R1.2a)Examples of problematic language include:

      Line 134: "detection of network oscillations at fast speeds" - the current EEG approach does not measure networks.

      This is an important issue. We acknowledge that our EEG approach does not directly measure fMRI-defined networks. Our claim is inferential, designed to estimate the temporal dynamics of the large-scale fMRI patterns we identified. The correlation between our fMRI-derived fluctuation amplitude (|PA – VD|) and 3-6 Hz EEG power provides suggestive evidence that the transitions between these network states occur at this frequency, rather than being a direct measurement of network oscillations.

      To support the validity of this inference, we performed two key analyses (now in Supplemental Materials). First, a simulation study provides a proof-of-concept, confirming our method can recover the frequency of a fast underlying oscillator from slow fMRI and fast EEG data. Second, a specificity analysis shows the EEG correlation is unique to our measure of fluctuation amplitude and not to simpler measures like overall connectivity strength. These analyses demonstrate that our interpretation is more plausible than alternative explanations.

      Overall, we have revised the manuscript to be more conservative in the language employed, such as presenting alternative explanations to the interpretations put forth based on correlative/observational evidence (e.g., our modifications above described in our response to comment R1.2). In addition, we have made changes throughout the report to state the issues related to reverse inference more explicitly and to better communicate that the evidence is suggestive – please see our numerous changes described in our response to comment R3.1. For the statement that the reviewer specifically mentioned here, we revised it to be more cautious (p. 7):

      “Although such speed outpaces the temporal resolution of fMRI, correlating fluctuations in dynamic connectivity measured from fMRI data with EEG oscillations can provide an estimate of the fluctuations’ speed. This interpretation of a correlation again runs up against issues related to reverse inference but would nonetheless serve as initial suggestive evidence that spontaneous transitions between network states occur rapidly.”

      (R1.2b) Line 148: "whether fluctuations between high- and low-PE networks occur sufficiently fast" - this implies spatial localization to networks that is not supported by the EEG analysis.

      Building on our changes described in our immediately prior response, we adjusted our text here to say our analyses searched for evidence consistent with the idea that the network fluctuations occur quickly rather than searching for decisive evidence favoring this idea (p. 7-8):

      “Finally, we examined rs-fMRI-EEG data to assess whether we find parallels consistent with the high/low-PE network fluctuations occurring at fast timescales suitable for the type of cognitive operations typically targeted by PE theories.”

      (R1.2c) Line 480: "how underlying neural oscillators can produce BOLD and EEG measurements" - no evidence is provided that the same neural sources underlie both modalities.

      As described above, these claims are based on the simulation study demonstrating that this is a possibility, and we have revised the manuscript overall to be clearer that this is our interpretation while providing alternative explanations.

      Reviewer #2 (Public review):

      Strengths:

      Clearly, a lot of work and data went into this paper, including 2 task-based fMRI experiments and the resting state data for the same participants, as well as a third EEG-fMRI dataset. Overall, well written with a couple of exceptions on clarity, as per below, and the methodology appears overall sound, with a couple of exceptions listed below that require further justification. It does a good job of acknowledging its own weakness.

      Weaknesses:

      (R2.1) The paper does a good job of acknowledging its greatest weakness, the fact that it relies heavily on reverse inference, but cannot quite resolve it. As the authors put it, "finding the same networks during a prediction error task and during rest does not mean that the networks' engagement during rest reflects prediction error processing". Again, the authors acknowledge the speculative nature of their claims in the discussion, but given that this is the key claim and essence of the paper, it is hard to see how the evidence is compelling to support that claim.

      We thank the reviewer for this comment. We agree that reverse inference is a fundamental challenge and that our central claim requires a particularly high bar of evidence. While no single analysis resolves this issue, our goal was to build a cumulative case that is compelling by converging on the same conclusion from multiple, independent lines of evidence.

      For our investigation, we initially established a task-general signature of prediction error (PE). By showing the same neural pattern represents PE in different contexts, we constrain the reverse inference, making it less likely that our findings are a task-specific artifact and more likely that they reflect the core, underlying process of PE. Building on this, our most compelling evidence comes from linking task and rest at the individual level. We didn't just find the same general network at rest; we showed that an individual’s unique anatomical pattern of PE-related connectivity during the task specifically predicts their own brain's fluctuation patterns at rest. This highly specific, person-by-person correspondence provides a direct bridge between an individual's task-evoked PE processing and their intrinsic, resting-state dynamics. Furthermore, these resting-state fluctuations correlate specifically with the 3-6 Hz theta rhythm—a well-established neural marker for PE.

      While reverse inference remains a fundamental limitation for many studies on resting-state cognition, the aspects mentioned above, we believe, provide suggestive evidence, favoring our PE interpretation. Nonetheless, we have made changes throughout the manuscript to be more conservative in the language we use to describe our results, to make it clear what claims are based on correlative/observational evidence, and to put forth alternative explanations for the identified effects. Please find our numerous changes detailed in our response to comment R3.1.

      (R2.2) Given how uncontrolled cognition is during "resting-state" experiments, the parallel made with prediction errors elicited during a task designed for that effect is a little difficult to make. How often are people really surprised when their brains are "at rest", likely replaying a previously experienced event or planning future actions under their control? It seems to be more likely a very low prediction error scenario, if at all surprising.

      We (and some others) take a broad interpretation of PE and believe it is often more intuitive to think about PE minimization in terms of uncertainty rather than “surprise”; the word “surprise” usually implies a sudden emotive reaction from the violation of expectations, which is not useful here.

      When planning future actions, each step of the plan is spurred by the uncertainty of what is the appropriate action given the scenario set up by prior steps. Each planned step erases some of that uncertainty. For example, you may be mentally simulating a conversation, what you will say, and what another person will say. Each step of this creates uncertainty of “what is the appropriate response?” Each reasoning step addresses contingencies. While planning, you may also uncover more obvious forms of uncertainty, sparking memory retrieval to finish it. A resting-state participant may think to cook a frozen pizza when they arrive home, but be uncertain about whether they have any frozen pizzas left, prompting episodic memory retrieval to address this uncertainty. We argue that every planning step or memory retrieval can be productively understood as being sparked by uncertainty/surprise (PE), and the subsequent cognitive response minimizes this uncertainty.

      We updated the Introduction to include a paragraph near the start providing this explanation (p. 3-4):

      “PE minimization may broadly coordinate brain functions of all sorts, including abstract cognitive functions. This includes the types of cognitive processes at play even in the absence of stimuli (e.g., while daydreaming). While it may seem counterintuitive to associate this type of cognition with PE – a concept often tied to external surprises – it has been proposed that the brain's internal generative model is continuously active.[12–14] Spontaneous thought, such as planning a future event or replaying a memory, is not a passive, low-PE process. Rather, it can be seen as a dynamic cycle of generating and resolving internal uncertainty. While daydreaming, you may be reminded of a past conversation, where you wish you had said something different. This situation contains uncertainty about what would have been the best thing to say. Wondering about what you wish you said can be viewed as resolving this uncertainty, in principle, forming a plan if the same situation ever arises again in the future. Each iteration of the simulated conversation repeatedly sparks and then resolves this type of uncertainty.”

      (R2.3)The quantitative comparison between networks under task and rest was done on a small subset of the ROIs rather than on the full network - why? Noting how small the correlation between task and rest is (r=0.021) and that's only for part of the networks, the evidence is a little tenuous. Running the analysis for the full networks could strengthen the argument.

      We thank the reviewer for this opportunity to clarify our method. A single correlation between the full, aggregated networks would be conceptually misaligned with what we aimed to assess. To test for a personspecific anatomical correspondence, it is necessary to examine the link between task and rest at a granular level. We therefore asked whether the specific parts of an individual's network most responsive to PE during the task are the same parts that show the strongest fluctuations at rest. Our analysis, performed iteratively across all 3,432 possible ROI subsets, was designed specifically to answer this question, which would be obscured by an aggregated network measure.

      We appreciate the reviewer's concern about the modest effect size (r = .021). However, this must be contextualized, as the short task scan has very low reliability (.08), which imposes a severe statistical ceiling on any possible task-rest correlation. Finding a highly significant effect (p < .001) in the face of such noisy data, therefore, provides robust evidence for a genuine task-rest correspondence.

      We updated the Discussion to discuss this point (p. 22-23):

      “A key finding supporting our interpretation is the significant link between individual differences in task-evoked PE responses and resting-state fluctuations. One might initially view the effect size of this correspondence (r = .021) as modest. However, this interpretation must be contextualized by the considerable measurement noise inherent in short task-fMRI scans; the split-half reliability of the task contrast was only .08. This low reliability imposes a severe statistical ceiling on any possible task-rest correlation. Therefore, detecting a highly significant (p < .001) relationship despite this constraint provides robust evidence for a genuine link. Furthermore, our analytical approach, which iteratively examined thousands of ROI subsets rather than one aggregated network, was intentionally granular. The goal was not simply to correlate two global measures, but to test for a personspecific anatomical correspondence – that is, whether the specific parts of an individual's network most sensitive to PE during the task are the same parts that fluctuate most strongly at rest. An aggregate analysis would obscure this critical spatial specificity. Taken together, this granular analysis provides compelling evidence for an anatomically consistent fingerprint of PE processing that bridges task-evoked activity and spontaneous restingstate dynamics, strengthening our central claim.”

      (R2.4) Looking at the results in Figure 2C, the four-quadrant description of the networks labelled for low and high PE appears a little simplistic. The authors state that this four-quadrant description omits some ROIs as motivated by prior knowledge. This would benefit from a more comprehensive justification.Which ROIs are excluded, and what is the evidence for exclusion?

      Our four-quadrant model is a principled simplification designed to distill the dominant, large-scale connectivity patterns from the complex modularity results. This approach focuses on coherent, well-documented anatomical streams while setting aside a few anatomically distant and disjoint ROIs that were less central to the main modules. This heuristic additionally unlocks more robust and novel analyses.

      The two low-PE posterior-anterior (PA) pathways are grounded in canonical processing streams. (i) The OCATL connection mirrors the ventral visual stream (the “what” pathway), which is fundamental for object recognition and is upregulated during the smooth processing of expected stimuli. (ii) The IPL-LPFC connection represents a core axis of the dorsal attention stream and the Fronto-Parietal Control Network (FPCN), reflecting the maintenance of top-down cognitive control when information is predictable; the IPL-LPFC module excludes ROIs in the middle temporal gyrus, which are often associated with the FPCN but are not covered here.

      In contrast, the two high-PE ventral-dorsal (VD) pathways reflect processes for resolving surprise and conflict. (i) The OC-IPL connection is a classic signature of attentional reorienting, where unexpected sensory input (high PE) triggers a necessary shift in attention; the OC-IPL module excludes some ROIs that are anterior to the occipital lobe and enter the fusiform gyrus and inferior temporal lobe. (ii) The ATL-LPFC connection aligns with mechanisms for semantic re-evaluation, engaging prefrontal control regions to update a mental model in the face of incongruent information.

      Beyond its functional/anatomical grounding, this simplification provides powerful methodological and statistical advantages. It establishes a symmetrical framework that makes our dynamic connectivity analyses tractable, such as our “cube” analysis of state transitions, which required overlapping modules. Critically, this model also offers a statistical safeguard. By ensuring each quadrant contributes to both low- and high-PE connectivity patterns, we eliminate confounds like region-specific signal variance or global connectivity. This design choice isolates the phenomenon to the pattern of connectivity itself (posterior-anterior vs. ventral-dorsal), making our interpretation more robust.

      We updated the end of the Study 1A results (p. 10-11):

      “Some ROIs appear in Figure 2C but are excluded from the four targeted quadrants (Figures 2C & 2D) – e.g., posterior inferior temporal lobe and fusiform ROIs are excluded from the OC-IPL module, and middle temporal gyrus ROIs are excluded from the IPL-LPFC modules. These exclusions, in favor of a four-quadrant interpretation, are motivated by existing knowledge of prominent structural pathways among these quadrants. This interpretation is also supported by classifier-based analyses showing connectivity within each quadrant is significantly influenced by PE (Supplemental Materials 2.2), along with analyses of single-region activity showing that these areas also respond to PE independently (Supplemental Materials 3). Hence, we proceeded with further analyses of these quadrants’ connections, which summarize PE’s global brain effects.

      “This four-quadrant setup also imparts analytical benefits. First, this simplified structure may better generalize across PE tasks, and Study 1B would aim to replicate these results with a different design. Second, the four quadrants mean that each ROI contributes to both the posterior-anterior and ventral-dorsal modules, which would benefit later analyses and rules out confounds such as PE eliciting increased/decreased connectivity between an ROI and the rest of the brain. An additional, less key benefit is that this setup allows more easily evaluating whether the same phenomena arise using a different atlas (Supplemental Materials Y).”

      (R2.5) The EEG-fMRI analysis claiming 3-6Hz fluctuations for PE is hard to reconcile with the fact that fMRI captures activity that is a lot slower, while some PEs are as fast as 150 ms. The discussion acknowledges this but doesn't seem to resolve it - would benefit from a more comprehensive argument.

      We thank the reviewer for raising this important point, which allows us to clarify the logic of our multimodal analysis. Our analysis does not claim that the fMRI BOLD signal itself oscillates at 3-6 Hz. Instead, it is based on the principle that the intensity of a fast neural process can be reflected in the magnitude of the slow BOLD response. It’s akin to using a long-exposure photograph to capture a fast-moving object; while the individual movements are blurred, the intensity of the blur in the photo serves as a proxy for the intensity of the underlying motion. In our case, the magnitude of the fMRI network difference (|PA – VD|) acts as the "blur," reflecting the intensity of the rapid fluctuations between states within that time window.

      Following this logic, we correlated this slow-moving fMRI metric with the power of the fast EEG rhythms, which reflects their amplitude. To bridge the different timescales, we averaged the EEG power over each fMRI time window and convolved it with the standard hemodynamic response function (HRF) – a crucial step to align the timing of the neural and metabolic signals. The resulting significant correlation specifically in the 3-6 Hz band demonstrates that when this rhythm is stronger, the fMRI data shows a greater divergence between network states. This allows us to infer the characteristic frequency of the underlying neural fluctuations without directly measuring them at that speed with fMRI, thus reconciling the two timescales.

      Reviewer #3 (Public review):

      Bogdan et al. present an intriguing and timely investigation into the intrinsic dynamics of prediction error (PE)-related brain states. The manuscript is grounded in an intuitive and compelling theoretical idea: that the brain alternates between high and low PE states even at rest, potentially reflecting an intrinsic drive toward predictive minimization. The authors employ a creative analytic framework combining different prediction tasks and imaging modalities. They shared open code, which will be valuable for future work.

      (R3.1) Consistency in Theoretical Framing

      The title, abstract, and introduction suggest inconsistent theoretical goals of the study.

      The title suggests that the goal is to test whether there are intrinsic fluctuations in high and low PE states at rest. The abstract and introduction suggest that the goal is to test whether the brain intrinsically minimizes PE and whether this minimization recruits global brain networks. My comments here are that a) these are fundamentally different claims, and b) both are challenging to falsify. For one, task-like recurrence of PE states during resting might reflect the wiring and geometry of the functional organization of the brain emerging from neurobiological constraints or developmental processes (e.g., experience), but showing that mirroring exists because of the need to minimize PE requires establishing a robust relationship with behavior or showing a causal effect (e.g., that interrupting intrinsic PE state fluctuations affects prediction).

      The global PE hypothesis-"PE minimization is a principle that broadly coordinates brain functions of all sorts, including abstract cognitive functions"-is more suitable for discussion rather than the main claim in the abstract, introduction, and all throughout the paper.

      Given the above, I recommend that the authors clarify and align their core theoretical goals across the title, abstract, introduction, and results. If the focus is on identifying fluctuations that resemble taskdefined PE states at rest, the language should reflect that more narrowly, and save broader claims about global PE minimization for the discussion. This hypothesis also needs to be contextualized within prior work. I'd like to see if there is similar evidence in the literature using animal models.

      Thank you for bringing up this issue. We have made changes throughout the paper to address these points. First, we have omitted reference to a “global PE hypothesis” from the Abstract and Introduction, in favor of structuring the Introduction in terms of a falsifiable question (p. 4):

      “We pursued this goal using three studies (Figure 1) that collectively targeted a specific question: Do the taskdefined connectivity signatures of high vs. low PE also recur during rest, and if so, how does the brain transition between exhibiting high/low signatures?”

      We made changes later in the Introduction to clarify that the investigation is based on correlative evidence and requires interpretations that may be debated (p. 5-7):

      “Although this does not entirely address the reverse inference dilemma and can only produce correlative evidence, the present research nonetheless investigates these widely speculated upon PE ideas more directly than any prior work.

      Although such speed outpaces the temporal resolution of fMRI, correlating fluctuations in dynamic connectivity measured from fMRI data with EEG oscillations can provide an estimate of the fluctuations’ speed. This interpretation of a correlation again runs up against issues related to reverse inference but would nonetheless serve as initial suggestive evidence that spontaneous transitions between network states occur rapidly.

      Second, we examined the recruitment of these networks during rs-fMRI, and although the problems related to reverse inference are impossible to overcome fully, we engage with this issue by linking rs-fMRI data directly to task-fMRI data of the same participants, which can provide suggestive evidence that the same neural mechanisms are at play in both.”

      We made changes throughout the Results now better describing the results as consistent with a hypothesis rather than demonstrating it (p. 12-19):

      “In other words, we essentially asked whether resting-state participants are sometimes in low PE states and sometimes in high PE states, which would be consistent with spontaneous PE processing in the absence of stimuli.

      These emerging states overlap strikingly with the previous task effects of PE, suggesting that rs-fMRI scans exhibit fluctuations that resemble the signatures of low- and high-PE states. 

      To be clear, this does not entirely dissuade concerns about reverse inference, which would require a type of causal manipulation that is difficult (if not impossible) to perform in a resting state scan. Nonetheless, these results provide further evidence consistent with our interpretation that the resting brain spontaneously fluctuates between high/low PE network states.

      These patterns are most consistent with a characteristic timescale near 3–6 Hz for the amplitude of the putative high/low-PE fluctuations. This is notably consistent with established links between PE and Delta/Theta and is further consistent with an interpretation in which these fluctuations relate to PE-related processing during rest.”

      We have also made targeted edits to the Discussion to present the findings in a more cautious way, more clearly state what is our interpretation, and provide alternative explanations (p. 19-26):

      “The present research conducted task-fMRI, rs-fMRI, and rs-fMRI-EEG studies to clarify whether PE elicits global connectivity effects and whether the signatures of PE processing arise spontaneously during rest. This investigation carries implications for how PE minimization may characterize abstract task-general cognitive processes. […] Although there are different ways to interpret this correlation, it is consistent with high/low PE states generally fluctuating at 3-6 Hz during rest. Below, we discuss these three studies’ findings.

      Our rs-fMRI investigation examined whether resting dynamics resemble the task-defined connectivity signatures of high vs. low PE, independent of the type of stimulus encountered. The resting-state analyses indeed found that, even at rest, participants’ brains fluctuated between strong ventral-dorsal connectivity and strong posterior-anterior connectivity, consistent with shifts between states of high and low PE. This conclusion is based on correlative/observational evidence and so may be controversial as it relies on reverse inference.

      These patterns resemble global connectivity signatures seen in resting-state participants, and correlations between fMRI and EEG data yield associations, consistent with participants fluctuating between high-PE (ventral-dorsal) and low-PE (posterior-anterior) states at 3-6 Hz. Although definitively testing these ideas is challenging, given that rs-fMRI is defined by the absence of any causal manipulations, our results provide evidence consistent with PE minimization playing a role beyond stimulus process.”

      (R3.2) Interpretation of PE-Related Fluctuations at Rest and Its Functional Relevance. It would strengthen the paper to clarify what is meant by "intrinsic" state fluctuations. Intrinsic might mean taskindependent, trait-like, or spontaneously generated. Which do the authors mean here? Is the key prediction that these fluctuations will persist in the absence of a prediction task?

      Of the three terms the reviewer mentioned, “spontaneous” and “task-independent” are the most accurate descriptors. We conceptualize these fluctuations as a continuous background process that persists across all facets of cognition, without requiring a task explicitly designed to elicit prediction error – although we, along with other predictive coding papers, would argue that all cognitive tasks are fundamentally rooted in PE mechanisms and thus anything can be seen as a “prediction task” (see our response to comment R2.2 for our changes to the Introduction that provide more intuition for this point). The proposed interactions can be seen as analogous to cortico-basal-thalamic loops, which are engaged across a vast and diverse array of cognitive processes.

      The prior submission only used the word “intrinsic” in the title. We have since revised it to “spontaneous,” which is more specific than “intrinsic,” and we believe clearer for a title than “task-independent” (p. 1): “Spontaneous fluctuations in global connectivity reflect transitions between states of high and low prediction error”

      We have also made tweaks throughout the manuscript to now use “spontaneously” throughout (it now appears 8 times in the paper).

      Regardless of the intrinsic argument, I find it challenging to interpret the results as evidence of PE fluctuations at rest. What the authors show directly is that the degree to which a subset of regions within a PE network discriminates high vs. low PE during task correlates with the magnitude of separation between high and low PE states during rest. While this is an interesting relationship, it does not establish that the resting-state brain spontaneously alternates between high and low PE states, nor that it does so in a functionally meaningful way that is related to behavior. How can we rule out brain dynamics of other processes, such as arousal, that also rise and fall with PE? I understand the authors' intention to address the reverse inference concern by testing whether "a participant's unique connectivity response to PE in the reward-processing task should match their specific patterns of resting-state fluctuation". However, I'm not fully convinced that this analysis establishes the functional role of the identified modules to PE because of the following:

      Theoretically, relating the activities of the identified modules directly to behavior would demonstrate a stronger functional role.

      (R3.2a) Across participants: Do individuals who exhibit stronger or more distinct PE-related fluctuations at rest also perform better on tasks that require prediction or inference? This could be assessed using the HCP prediction task, though if individual variability is limited (e.g., due to ceiling effects), I would suggest exploring a dataset with a prediction task that has greater behavioral variance.

      This is a good idea, but unfortunately difficult to test with our present data. The HCP gambling task used in our study was not designed to measure individual differences in prediction or inference and likely suffers from ceiling effects. Because the task outcomes are predetermined and not linked to participants' choices, there is very little meaningful behavioral variance in performance to correlate with our resting-state fluctuation measure.

      While we agree that exploring a different dataset with a more suitable task would be ideal, given the scope of the existing manuscript, this seems like it would be too much. Although these results would be informative, they would ultimately still not be a panacea for the reverse inference issues.

      Or even more broadly, does this variability in resting state PE state fluctuations predict general cognitive abilities like WM and attention (which the HCP dataset also provides)? I appreciate the inclusion of the win-loss control, and I can see the intention to address specificity. This would test whether PE state fluctuations reflect something about general cognition, but also above and beyond these attentional or WM processes that we know are fluctuating.

      This is a helpful suggestion, motivating new analyses: We measured the degree of resting-state fluctuation amplitude across participants and correlated it with the different individual differences measures provided with the HCP data (e.g., measures of WM performance). We computed each participant’s fluctuation amplitude measure as the average absolute difference between posterior-anterior and ventral-dorsal connectivity; this is the average of the TR-by-TR fMRI amplitude measure from Study 3. We correlated this individual difference score with all of the ~200 individual difference measures provided with the HCP dataset (e.g., measures of intelligence or personality). We measured the Spearman correlation between mean fluctuation amplitude with each of those ~200 measures, while correcting for multiple hypotheses using the False Discovery Rate approach.[18]

      We found a robust negative association with age, where older participants tend to display weaker fluctuations (r = -.16, p < .001). We additionally find a positive association with the age-adjusted score on the picture sequence task (r = .12, p<sub>corrected</sub> = .03) and a negative association with performance in the card sort task (r = -.12, p<sub>corrected</sub> = 046). It is unclear how to interpret these associations, without being speculative, given that fluctuation amplitude shows one positive association with performance and one negative association, albeit across entirely different tasks.  We have added these correlation results as Supplemental Materials 8 (SM p. 11):

      “(8) Behavioral differences related to fluctuation amplitude 

      To investigate whether individual differences in the magnitude of resting-state PE-state fluctuations predict general cognitive abilities, we correlated our resting-state fluctuation measure with the cognitive and demographic variables provided in the HCP dataset.

      (8.1) Methods

      For each of the 1,000 participants, we calculated a single fluctuation amplitude score. This score was defined as the average absolute difference between the time-varying posterior-anterior (PA) and ventral-dorsal (VD) connectivity during the resting-state fMRI scan (the average of the TR-by-TR measure used for Study 3). We then computed the Spearman correlation between this score and each of the approximately 200 individual difference measures provided in the HCP dataset. We corrected for multiple comparisons using the False Discovery Rate (FDR) approach.

      (8.2) Results

      The correlations revealed a robust negative association between fluctuation amplitude and age, indicating that older participants tended to display weaker fluctuations (r = -.16, p<sub>corrected</sub> < .001). After correction, two significant correlations with cognitive performance emerged: (i) a positive association with the age-adjusted score on the Picture Sequence Memory Test (r = .12, p<sub>corrected</sub> = .03), (ii) a negative association with performance on the Card Sort Task (r = -.12, p<sub>corrected</sub> = .046). As greater fluctuation amplitude is linked to better performance on one task but worse performance on another, it is unclear how to interpret these findings.”

      We updated the main text Methods to direct readers to this content (p. 39-40):

      “(4.4.3) Links between network fluctuations and behavior

      We considered whether the extent of PE-related network expression states during resting-state is behaviorally relevant. We specifically investigated whether individual differences in the overall magnitude of resting-state fluctuations could predict individual difference measures, provided with the HCP dataset. This yielded a significant association with age, whereby older participants tended to display weaker fluctuations. However, associations with cognitive measures were limited. A full description of these analyses is provided in Supplemental Materials 8.”

      (R3.2b) Within participants: Do momentary increases in PE-network expression during tasks relate to better or faster prediction? In other words, is there evidence that stronger expression of PE-related states is associated with better behavioral outcomes?

      This is a good question that probes the direct behavioral relevance of these network states on a trial-by-trial basis. We agree with the reviewer's intuition; in principle, one would expect a stronger expression of the low-PE network state on trials where a participant correctly and quickly gives a high likelihood rating to a predictable stimulus.

      Following this suggestion, we performed a new analysis in Study 1A to test this. We found that while network expression was indeed linked to participants’ likelihood ratings: higher likelihood ratings correspond to stronger posterior-anterior connectivity, whereas lower ratings correspond to stronger ventral-dorsal connectivity (Connectivity-Direction × likelihood, β [standardized] = .28, p = .02). Yet, this is not a strong test of the reviewer’s hypothesis, and different exploratory analyses of response time yield null results (p > .05). We suspect that this is due to the effect being too subtle, so we have insufficient statistical power. A comparable analysis was not feasible for Study 1B, as its design does not provide an analogous behavioral measure of trialby-trial prediction success.

      (R3.3) A priori Hypothesis for EEG Frequency Analysis.

      It's unclear how to interpret the finding that fMRI fluctuations in the defined modules correlate with frontal Delta/Theta power, specifically in the 3-6 Hz range. However, in the EEG literature, this frequency band is most commonly associated with low arousal, drowsiness, and mind wandering in resting, awake adults, not uniquely with prediction error processing. An a priori hypothesis is lacking here: what specific frequency band would we expect to track spontaneous PE signals at rest, and why? Without this, it is difficult to separate a PE-based interpretation from more general arousal or vigilance fluctuations.

      This point gets to the heart of the challenge with reverse inference in resting-state fMRI. We agree that an interpretation based on general arousal or drowsiness is a potential alternative that must be considered. However, what makes a simple arousal interpretation challenging is the highly specific nature of our fMRI-EEG association. As shown in our confirmatory analyses (Supplemental Materials 6), the correlation with 3-6 Hz power was found exclusively with the absolute difference between our two PE-related network states (|PA – VD|)—a measure of fluctuation amplitude. We found no significant relationship with the signed difference (a bias toward one state) or the sum (the overall level of connectivity). This specificity presents a puzzle for a simple drowsiness account; it seems less plausible that drowsiness would manifest specifically as the intensity of fluctuation between two complex cognitive networks, rather than as a more straightforward change in overall connectivity. While we cannot definitively rule out contributions from arousal, the specificity of our finding provides stronger evidence for a structured cognitive process, like PE, than for a general, undifferentiated state. 

      We updated the Discussion to make the argument above and also to remind readers that alternative explanations, such as ones based on drowsiness, are possible (p. 24):

      “We specifically interpret the fMRI-EEG correlation as reflecting fluctuation speed because we correlated EEG oscillatory power with the fluctuation amplitude computed from fMRI data. Simply correlating EEG power with the average connectivity or the signed difference between posterior-anterior and ventral-dorsal connectivity yields null results (Supplemental Materials 6), suggesting that this is a very particular association, and viewing it as capturing fluctuation amplitude provides a parsimonious explanation. Yet, this correlation may be interpreted in other ways. For example, resting-state Theta is also a signature of drowsiness,[2] which may correlate with PE processing, but perhaps should be understood as some other mechanism.”

      (R3.4) Significance Assessment

      The significance of the correlation above and all other correlation analyses should be assessed through a permutation test rather than a single parametric t-test against zero. There are a few reasons: a) EEG and fMRI time series are autocorrelated, violating the independence assumption of parametric tests;

      Standard t-tests can underestimate the true null distribution's variance, because EEG-fMRI correlations often involve shared slow drifts or noise sources, which can yield spurious correlations and inflating false positives unless tested against an appropriate null.

      Building a null distribution that preserves the slow drifts, for example, would help us understand how likely it is for the two time series to be correlated when the slow drifts are still present, and how much better the current correlation is, compared to this more conservative null. You can perform this by phase randomizing one of the two time courses N times (e.g., N=1000), which maintains the autocorrelation structure while breaking any true co-occurrence in patterns between the two time series, and compute a non-parametric p-value. I suggest using this approach in all correlation analyses between two time series.

      This is an important statistical point to clarify, and the suggested analysis is valuable. The reviewer is correct that the raw fMRI and EEG time series are autocorrelated. However, because our statistical approach is a twolevel analysis, we reasoned that non-independence at the correlation-level would not invalidate the higher-level t-test. The t-test’s assumption of independence applies to the individual participants' coefficients, which are independent across participants. Thus, we believe that our initial approach is broadly appropriate, and its simplicity allows it to be easily communicated.

      Nonetheless, the permutation-testing procedure that the Reviewer describes seems like an important analysis to test, given that permutation-testing is the gold standard for evaluating statistical significance, and it could guarantee that our above logic is correct. We thus computed the analysis as the reviewer described. For each participant, we phase-randomized the fMRI fluctuation amplitude time series. Specifically, we randomized the Fourier phases of the |PA–VD| series (within run), while retaining the original amplitude spectrum; inverse transforms yielded real surrogates with the same power spectrum. This was done for each participant once per permutation. Each participant’s phase-randomized data was submitted to the analysis of each oscillatory power band as originally, generating one mean correlation for each band. This was done 1,000 times.

      Across the five bands, we find that the grand mean correlation is near zero (M<sub>r</sub> = .0006) and the 97.5<sup>th</sup> percentile critical value of the null distribution is r = ~.025; this 97.5<sup>th</sup> percentile corresponds to the upper end of a 95% confidence interval for a band’s correlation; the threshold minimally differs across bands (.024 < rs < .026). Our original correlation coefficients for Delta (M<sub>r</sub> = .042) and Theta (M<sub>r</sub> = .041), which our conclusions focused on, remained significant (p ≤ .002); we can perform family-wise error-rate correction by taking the highest correlation across any band for a given permutation, and the Delta and Theta effects remain significant (p<sub>FWE</sub>corrected ≤ .003); previously Reviewer comment R1.4c requested that we employ family-wise error correction.

      These correlations were previously reported in Table 1, and we updated the caption to note what effects remain significant when evaluated using permutation-testing and with family-wise error correction (p. 19):

      “The effects for Delta, Theta, Beta, and Gamma remain significant if significance testing is instead performed using permutation-testing and with family-wise error rate correction (p<sub>corrected</sub> < .05).”

      We updated the Methods to describe the permutation-testing analysis (p. 43):

      “To confirm the significance of our fMRI-EEG correlations with a non-parametric approach, we performed a group-level permutation-test. For each of 1,000 permutations, we phase-randomized the fMRI fluctuation amplitude time series. Specifically, we randomized the Fourier phases of the |PA–VD| series (within run), while retaining the original amplitude spectrum; inverse transforms yielded real surrogates with the same power spectrum. This procedure breaks the true temporal relationship between the fMRI and EEG data while preserving its structure. We then re-computed the mean Spearman correlation for each frequency band using this phase-randomized data. We evaluated significance using a family-wise error correction approach that accounts for us analyzing five oscillatory power bands. We thus create a null distribution composed of the maximum correlation value observed across all frequency bands from each permutation. Our observed correlations were then tested for significance against this distribution of maximums.”

      (R3.5) Analysis choices

      If I'm understanding correctly, the algorithm used to identify modules does so by assigning nodes to communities, but it does not itself restrict what edges can be formed from these modules. This makes me wonder whether the decision to focus only on connections between adjacent modules, rather than considering the full connectivity, was an analytic choice by the authors. If so, could you clarify the rationale? In particular, what justifies assuming that the gradient of PE states should be captured by edges formed only between nearby modules (as shown in Figure 2E and Figure 4), rather than by the full connectivity matrix? If this restriction is instead a by-product of the algorithm, please explain why this outcome is appropriate for detecting a global signature of PE states in both task and rest.

      We discuss this matter in our response to comment R2.(4).

      When assessing the correspondence across task-fMRI and rs-fMRI in section 2.2.2, why was the pattern during task calculated from selecting a pair of bilateral ROIs (resulting in a group of eight ROIs), and the resting state pattern calculated from posterior-anterior/ventral-dorsal fluctuation modules? Doesn't it make more sense to align the two measures? For example, calculating task effects on these same modules during task and rest?

      We thank the reviewer for this question, as it highlights a point in our methods that we could have explained more clearly. The reviewer is correct that the two measures must be aligned, and we can confirm that they were indeed perfectly matched.

      For the analysis in Section 2.2.2, both the task and resting-state measures were calculated on the exact same anatomical substrate for each comparison. The analysis iteratively selected a symmetrical subset of eight ROIs from our larger four quadrants. For each of these 3,432 iterations, we computed the task-fMRI PE effect (the Connectivity Direction × PE interaction) and the resting-state fluctuation amplitude (E[|PA – VD|]) using the identical set of eight ROIs. The goal of this analysis was precisely to test if the fine-grained anatomical pattern of these effects correlated within an individual across the task and rest states. We will revise the text in Section 2.2.2 to make this direct alignment of the two measures more explicit.

      Recommendations for authors:

      Reviewer #1 (Recommendations for authors):

      (R1.3) Several prior studies have described co-activation or connectivity "templates" that spontaneously alternate during rest and task states, and are linked to behavioral variability. While they are interpreted differently in terms of cognitive function (e.g., in terms of sustained attention: Monica Rosenberg; alertness: Catie Chang), the relationship between these previously reported templates and those identified in the current study warrants discussion. Are the current templates spatially compatible with prior findings while offering new functional interpretations beyond those already proposed in the literature? Or do they represent spatially novel patterns?

      Thank you for this suggestion. Broadly, we do not mean to propose spatially novel patterns but rather focus on how these are repurposed for PE processing. In the Discussion, we link our identified connectivity states to established networks (e.g., the FPCN). We updated this paragraph to mention that these patterns are largely not spatially novel (p. 20):

      “The connectivity patterns put forth are, for the most part, not spatially novel and instead overlap heavily with prior functional and anatomical findings.”

      Regarding the specific networks covered in the prior work by Rosenberg and Chang that the reviewer seems to be referring to, [7,8] this research has emphasized networks anchored heavily in sensorimotor, subcortical– cerebellar, and medial frontal circuits, and so mostly do not overlap with the connectivity effects we put forth.

      (R1.4) Additional points:

      (R1.4a) I do not think that the logic for taking the absolute difference of fMRI connectivity is convincing. What happens if the sign of the difference is maintained ?

      Thank you for pointing out this area that requires clarification. Our analysis targets the amplitude of the fluctuation between brain states, not the direction. We define high fluctuation amplitude as moments when the brain is strongly in either the PA state (PA > VD) or the VD state (VD > PA). The absolute difference |PA – VD| correctly quantifies this intensity, whereas a signed difference would conflate these two distinct high-amplitude moments. Our simulation study (Supplemental Materials, Section 5) provides the theoretical validation for this logic, showing how this absolute difference measure in slow fMRI data can track the amplitude of a fast underlying neural oscillator.

      When the analysis is tested in terms of the signed difference, as suggested by the Reviewer, the association between the fMRI data and EEG power is insignificant for each power band (ps<sub>uncorrected</sub> ≥ .47). We updated Supplemental Materials 6 to include these results. Previously, this section included the fluctuation amplitude (fMRI) × EEG power results while controlling for: (i) the signed difference between posterior-anterior and ventral-dorsal connectivity, (ii) the sum of posterior-anterior and ventral-dorsal connectivity, and (iii) the absolute value of the sum of posterior-anterior and ventral-dorsal connectivity. For completeness, we also now report the correlation between each EEG power band and each of those other three measures (SM, p. 9)

      “We additionally tested the relationship between each of those three measures and the five EEG oscillation bands. Across the 15 tests, there were no associations (ps<sub>uncorrected</sub>  ≥ .04); one uncorrected p-value was at p = .044, although this was expected given that there were 15 tests. Thus, the association between EEG oscillations and the fMRI measure is specific to the absolute difference (i.e., amplitude) measure.”

      (R1.4b) Reasoning of focus on frontal and theta band is weak, and described as "typical" (line 359) based on a single study.

      Sorry about this. There is a rich literature on the link between frontal theta and prediction error,[3,9–11] and we updated the Introduction to include more references to this work (p. 18): “The analysis was first done using power averaged across frontal electrodes, as these are the typical focus of PE research on oscillations.[3,9–11]”

      We have also updated the Methods to cite more studies that motivate our electrode choice (p. 41): “The analyses first targeted five midline frontal electrodes (F3, F1, Fz, F2, F4; BioSemi64 layout), given that this frontal row is typically the focus of executive-function PE research on oscillations.[9–11]”

      (R1.4c) No correction appears to have been applied for the association between EEG power and fMRI connectivity. Given that 100 frequency bins were collapsed into 5 canonical bands, a correction for 5 comparisons seems appropriate. Notably, the strongest effects in the delta and theta bands (particularly at fronto-central electrodes) may still survive correction, but this should be explicitly tested and reported.

      Thanks for this suggestion. We updated the Table 1 caption to mention what results survive family-wise error rate correction – as the reviewer suggests, the Delta/Theta effects would survive Bonferroni correction for five tests, although per a later comment suggesting that we evaluate statistical significance with a permutationtesting approach (comment R3.4), we instead report family-wise error correction based on that. The revised caption is as follows (p. 19):

      “The effects for Delta, Theta, Beta, and Gamma remain significant if significance testing is instead performed using permutation-testing and with family-wise error rate correction (p<sub>corrected</sub> < .05).”

      (R1.4d) Line 135. Not sure I understand what you mean by "moods". What is the overall point here?

      The overall argument is that the fluctuations occur rapidly rather than slowly. By slow “moods” we refer to how a participant could enter a high anxiety state of >10 seconds, linked to high PE fluctuations, and then shift into a low anxiety state, linked to low PE fluctuations. We argue that this is not occurring. Regardless, we recognize that referring to lengths of time as short as 10 seconds or so is not a typical use of the word “mood” and is potentially ambiguous, so we have omitted this statement, which was originally on page 6: “Identifying subsecond fluctuations would broaden the relevance of the present results, as they rule out that the PE states derive from various moods.”

      (R1.4e) Line 100. "Few prior PE studies have targeted PE, contrasting the hundreds that have targeted BOLD". I don't understand this sentence. It's presumably about connectivity vs activity?

      Yes, sorry about this typo. The reviewer is correct, and that sentence was meant to mention connectivity. We corrected (p. 5): “Few prior PE studies have targeted connectivity, contrasting the hundreds that have targeted BOLD.”

      (R1.4f) Line 373: "0-0.5Hz" in the caption is probably "0-50Hz".

      Yes, this was another typo, thank you. We have corrected it (p. 19): “… every 0.5 Hz interval from 0-50 Hz.”

      Reviewer #2 (Recommendations for authors):

      (R2.6) (Page 3) When referring to the "limited" hypothesis of local PE, please clarify in what sense is it limited. That statement is unclear.

      Thank you for pointing out this text, which we now see is ambiguous. We originally use "limited" to refer to the hypothesis's constrained scope – namely, that PE is relevant to various low-level operations (e.g., sensory processing or rewards) but the minimization of PE does not guide more abstract cognitive processes. We edited this part of the Introduction to be clearer (p. 3)

      “It is generally agreed that the brain uses PE mechanisms at neuronal or regional levels,[15,16] and this idea has been useful in various low-level functional domains, including early vision [15] and dopaminergic reward processing.[17] Some theorists have further argued that PE propagates through perceptual pathways and can elicit downstream cognitive processes to minimize PE.”

      (R2.7) (Page 5) "Few prior PE have targeted PE"... this statement appears contradictory. Please clarify.

      Sorry about this typo, which we have corrected (p. 5):

      “Few prior PE studies have targeted connectivity, contrasting the hundreds that have targeted BOLD.”

      (R2.8) What happened to the data of the medium PE condition in Study 1A?

      The medium PE condition data were not excluded. We modeled the effect of prediction error on connectivity using a linear regression across the three conditions, coding them as a continuous variable (Low = -1, Medium = 0, High = +1). This approach allowed us to identify brain connections that showed a linear increase or decrease in strength as a function of increasing PE. This linear contrast is a more specific and powerful way to isolate PErelated effects than a High vs. Low contrast. We updated the Results slightly to make this clearer (p. 8-9):

      “In the fMRI data, we compared the three PE conditions’ beta-series functional connectivity, aiming to identify network-level signatures of PE processing, from low to high. […] For the modularity analysis, we first defined a connectome matrix of beta values, wherein each edge’s value was the slope of a regression predicting that edge’s strength from PE (coded as Low = -1, Medium = 0, High = +1; Figure 2A).”

      (R2.9) (Page 15) The point about how the dots in 6H follow those in 6J better than those in 6I is a little subjective - can the authors provide an objective measure?

      Thank you for pointing out this issue. The visual comparison using Figure 6 was not meant as a formal analysis but rather to provide intuition. However, as the reviewer describes, this is difficult to convey. Our formal analysis is provided in Supplemental Materials 5, where we report correlation coefficients between a very large number of simulated fMRI data points and EEG data points corresponding to different frequencies. We updated this part of the Results to convey this (p. 16-17):

      “Notice how the dots in Figure 6H follow the dots in Figure 6J (3 Hz) better than the dots in Figure 6I (0.5 Hz) or Figure 6K (10 Hz); this visual comparison is intended for illustrative purposes only, and quantitative analyses are provided in Supplemental Materials 5.”

      References

      (1) Zalesky, A., Fornito, A. & Bullmore, E. T. Network-based statistic: identifying differences in brain networks. Neuroimage 53, 1197–1207 (2010)

      (2) Strijkstra, A. M., Beersma, D. G., Drayer, B., Halbesma, N. & Daan, S. Subjective sleepiness correlates negatively with global alpha (8–12 Hz) and positively with central frontal theta (4–8 Hz) frequencies in the human resting awake electroencephalogram. Neuroscience letters 340, 17–20 (2003).

      (3) Cavanagh, J. F. & Frank, M. J. Frontal theta as a mechanism for cognitive control. Trends in cognitive sciences 18, 414–421 (2014).

      (4) Grech, R. et al. Review on solving the inverse problem in EEG source analysis. Journal of neuroengineering and rehabilitation 5, 25 (2008)

      (5) Palva, J. M. et al. Ghost interactions in MEG/EEG source space: A note of caution on inter-areal coupling measures. Neuroimage 173, 632–643 (2018).

      (6) Koles, Z. J. Trends in EEG source localization. Electroencephalography and clinical Neurophysiology 106, 127–137 (1998).

      (7) Rosenberg, M. D. et al. A neuromarker of sustained attention from whole-brain functional connectivity. Nature neuroscience 19, 165–171 (2016).

      (8) Goodale, S. E. et al. fMRI-based detection of alertness predicts behavioral response variability. elife 10, e62376 (2021).

      (9) Cavanagh, J. F. Cortical delta activity reflects reward prediction error and related behavioral adjustments, but at different times. NeuroImage 110, 205–216 (2015)

      (10) Hoy, C. W., Steiner, S. C. & Knight, R. T. Single-trial modeling separates multiple overlapping prediction errors during reward processing in human EEG. Communications Biology 4, 910 (2021).

      (11) Neo, P. S.-H., Shadli, S. M., McNaughton, N. & Sellbom, M. Midfrontal theta reactivity to conflict and error are linked to externalizing and internalizing respectively. Personality neuroscience 7, e8 (2024).

      (12) Friston, K. J. The free-energy principle: a unified brain theory? Nature reviews neuroscience 11, 127–138 (2010)

      (13) Feldman, H. & Friston, K. J. Attention, uncertainty, and free-energy. Frontiers in human neuroscience 4, 215 (2010).

      (14) Friston, K. J. et al. Active inference and epistemic value. Cognitive neuroscience 6, 187–214 (2015).

      (15) Rao, R. P. & Ballard, D. H. Predictive coding in the visual cortex: a functional interpretation of some extraclassical receptive-field effects. Nature neuroscience 2, 79–87 (1999)

      (16) Walsh, K. S., McGovern, D. P., Clark, A. & O’Connell, R. G. Evaluating the neurophysiological evidence for predictive processing as a model of perception. Annals of the new York Academy of Sciences 1464, 242– 268 (2020)

      (17) Niv, Y. & Schoenbaum, G. Dialogues on prediction errors. Trends in cognitive sciences 12, 265–272 (2008).

      (18) Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological) 57, 289–300 (1995).

    1. Multi-Carrier Shipping Software: All you need to know if you are an eCommerce expert

      H1 needs to be updated to include the phrase mini guide

      GFS Mini Guide to Multi-Carrier Shipping Software

      We can underneath the H1 keep - All you need to know etc But ensure it has no html header code

    1. Reviewer #1 (Public review):

      The authors relate a language model developed to predict whether a given sentence correctly followed another given sentence to EEG recordings in a novel way, showing receptive fields related to widely used TRFs. In these responses (or "regression results"), differences between representational levels are found, as well as differences between attended and unattended speech stimuli, and whether there is hearing loss. These differences are found per EEG channel.

      In addition to these novel regression results, which are apparently captured from the EEG specifically around the sentence stimulus offsets, the authors also perform a more standard mTRF analysis using a software package (Eelbrain) and TRF regressors that will be more familiar to researchers adjacent to these topics, which was highly appreciated for its comparative value. Comparing these TRFs with the authors' original regression results, several similarities can be seen. Specifically, response contrasts for attended versus unattended speaker during mixed speech, for the phoneme, syllable, and sentence regressors, are greater for normal-hearing participants than hearing-impaired participants for both analyses, and the temporal and spatial extents of the significant differences are roughly comparable (left-front and 0 - 200 ms for phoneme and syllable, and left and 200 - 300 ms for sentence).

      The inclusion of the mTRF analysis is helpful also because some aspects of the authors' original regression results, between the EEG data and the HM-LSTM linguistic model, are less than clear. The authors state specifically that their regression analysis is only calculated in the -100 - 300 ms window around stimulus/sentence offsets. They clarify that this means that most of the EEG data acquired while the participants are listening to the sentences is not analyzed, because their HM-LSTM model implementation represents all acoustic and linguistic features in a condensed way, around the end of the sentence. Thus the regression between data and model only occurs where the model predictions exist, which is the end of the sentences. This is in contrast to the mTRF analysis, which seems to have been done in a typical way, regressing over the entire stimulus time, because those regressors (phoneme onset, word onset, etc.) exist over the entire sentence time. If my reading of their description of the HM-LSTM regression is correct, it is surprising that the regression weights are similar between the HM-LSTM model and the mTRF model.

      However, the code that the authors uploaded to OSF seems to clarify this issue. In the file ridge_lstm.py, the authors construct the main regressor matrices called X1 and X2 which are passed to sklearn to do the ridge regression. This ridge regression step is calculated on the continuous 10-minute bouts of EEG and stimuli, and it is calculated in a loop over lag times, from -100 ms to 300 ms lag. These regressor matrices are initialized as zeros, and are then filled in two steps: the HM_LSTM model unit weights are read from numpy files and written to the matrices at one timepoint per sentence (as the authors describe in the text), and the traditional phoneme, syllable, etc. annotations are ALSO read in (from csv files) and written to the matrices, putting 1s at every timepoint of those corresponding onsets/offsets. Thus the actual model regressor matrix for the authors' main EEG results includes BOTH the HM_LSTM model weights for each sentence AND the feature/annotation times, for whichever of the 5 features is being analyzed (phonemes, syllables, words, phrases, or sentences).

      So for instance, for the syllable HM_LSTM regression results, the regressor matrix contains: 1) the HM_LSTM model weights corresponding to syllables (a static representation, placed once per sentence offset time), AND 2) the syllable onsets themselves, placed as a row of 1s at every syllable onset time. And as another example, for the word HM_LSTM regression results, the regressor matrix contains: 1) the HM_LSTM model weights corresponding to words (a static representation, placed once per sentence offset time), AND 2) the word onsets themselves, placed as a row of 1s at every word onset time.

      If my reading of the code is correct, there are two main points of clarification for interpreting these methods:

      First, the authors' window of analysis of the EEG is not "limited" to 400 ms as they say; rather the time dimension of both their ridge regression results and their traditional mTRF analysis is simply lags (400 ms-worth), and the responses/receptive fields are calculated over the entire 10-minute trials. This is the normal way of calculating receptive fields in a continuous paradigm. The authors seem to be focusing on the peri-sentence offset time points because that is where the HM_LSTM model weights are placed in the regressor matrix. Also because of this issue, it is not really correct when the authors say that some significant effect occurred at some latency "after sentence offset". The lag times of the regression results should have the traditional interpretation of lag/latency in receptive field analyses.

      Second, as both the traditional linguistic feature annotations and the HM_LSTM model weights are part of the regression for the main ridge regression results here, it is not known what the contribution specifically of the HM_LSTM portion of the regression was. Because the more traditional mTRF analysis showed many similar results to the main ridge regression results here, it seems probable that the simple feature annotations themselves, rather than the HM_LSTM model weights, are responsible for the main EEG results. A further analysis separating these two sets of regressors would shed light on this question.

    1. Reviewer #2 (Public review):

      Summary:

      This very ambitious project addresses one of the core questions in visual processing related to the underlying anatomical and functional architecture. Using a large sample of rare and high-quality EEG recordings in humans, the authors assess whether face-selectivity is organised along a posterior-anterior gradient, with selectivity and timing increasing from posterior to anterior regions. The evidence suggests that it is the case for selectivity, but the data are more mixed about the temporal organisation, which the authors use to conclude that the classic temporal hierarchy described in textbooks might be questioned, at least when it comes to face processing.

      Strengths:

      A huge amount of work went into collecting this highly valuable dataset of rare intracranial EEG recordings in humans. The data alone are valuable, assuming they are shared in an easily accessible and documented format. Currently, the OSF repository linked in the article is empty, so no assessment of the data can be made. The topic is important, and a key question in the field is addressed. The EEG methodology is strong, relying on a well-established and high SNR SSVEP method. The method is particularly well-suited to clinical populations, leading to interpretable data in a few minutes of recordings. The authors have attempted to quantify the data in many different ways and provided various estimates of selectivity and timing, with matching measures of uncertainty. Non-parametric confidence intervals and comparisons are provided. Collectively, the various analyses and rich illustrations provide superficially convincing evidence in favour of the conclusions.

      Weaknesses:

      (1) The work was not pre-registered, and there is no sample size justification, whether for participants or trials/sequences. So a statistical reviewer should assess the sensitivity of the analyses to different approaches.

      (2) Frequentist NHST is used to claim lack of effects, which is inappropriate, see for instance:

      Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, P values, confidence intervals, and power: A guide to misinterpretations. European Journal of Epidemiology, 31(4), 337-350. https://doi.org/10.1007/s10654-016-0149-3

      Rouder, J. N., Morey, R. D., Verhagen, J., Province, J. M., & Wagenmakers, E.-J. (2016). Is There a Free Lunch in Inference? Topics in Cognitive Science, 8(3), 520-547. https://doi.org/10.1111/tops.12214

      (3) In the frequentist realm, demonstrating similar effects between groups requires equivalence testing, with bounds (minimum effect sizes of interest) that should be pre-registered:

      Campbell, H., & Gustafson, P. (2024). The Bayes factor, HDI-ROPE, and frequentist equivalence tests can all be reverse engineered-Almost exactly-From one another: Reply to Linde et al. (2021). Psychological Methods, 29(3), 613-623. https://doi.org/10.1037/met0000507

      Riesthuis, P. (2024). Simulation-Based Power Analyses for the Smallest Effect Size of Interest: A Confidence-Interval Approach for Minimum-Effect and Equivalence Testing. Advances in Methods and Practices in Psychological Science, 7(2), 25152459241240722. https://doi.org/10.1177/25152459241240722

      (4) The lack of consideration for sample sizes, the lack of pre-registration, and the lack of a method to support the null (a cornerstone of this project to demonstrate equivalence onsets between areas), suggest that the work is exploratory. This is a strength: we need rich datasets to explore, test tools and generate new hypotheses. I strongly recommend embracing the exploration philosophy, and removing all inferential statistics: instead, provide even more detailed graphical representations (include onset distributions) and share the data immediately with all the pre-processing and analysis code.

      (5) Even if the work was pre-registered, it would be very difficult to calculate p-values conditional on all the uncertainty around the number of participants, the number of contacts and the number of trials, as they are random variables, and sampling distributions of key inferences should be integrated over these unknown sources of variability. The difficulty of calculating/interpreting p-values that are conditional on so many pre-processing stages and sources of uncertainty is traditionally swept under the rug, but nevertheless well documented:

      Kruschke, J.K. (2013) Bayesian estimation supersedes the t test. J Exp Psychol Gen, 142, 573-603. https://pubmed.ncbi.nlm.nih.gov/22774788/

      Wagenmakers, E.-J. (2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review, 14(5), 779-804. https://doi.org/10.3758/BF03194105<br /> https://link.springer.com/article/10.3758/BF03194105

      (6) Currently, there is no convincing evidence in the article to clearly support the main claims.

      Bootstrap confidence intervals were used to provide measures of uncertainty. However, the bootstrapping did not take the structure of the data into account, collapsing across important dependencies in that nested structure: participants > hemispheres > contacts > conditions > trials.

      Ignoring data dependencies and the uncertainty from trials could lead to a distorted CI. Sampling contacts with replacement is inappropriate because it breaks the structure of the data, mixing degrees of freedom across different levels of analysis. The key rule of the bootstrap is to follow the data acquisition process, and therefore, sampling participants with replacement should come first. In a hierarchical bootstrap, the process can be repeated at nested levels, so that for each resampled participant, then contacts are resampled (if treated as a random variable), then trials/sequences are resampled, keeping paired measurements together (hemispheres, and typically contacts in a standard EEG experiment with fixed montage). The same hierarchical resampling should be applied to all measurements and inferences to capture all sources of variability. Selectivity and timing should be quantified at each contact after resampling of trials/sequences before integrating across hemispheres and participants using appropriate and justified summary measures.

      The authors already recognise part of the problem, as they provide within-participant analyses. This is a very good step, inasmuch as it addresses the issue of mixing-up degrees of freedom across levels, but unfortunately these analyses are plagued with small sample sizes, making claims about the lack of differences even more problematic--classic lack of evidence == evidence of absence fallacy. In addition, there seem to be discrepancies between the mean and CI in some cases: 15 [-20, 20]; 8 [-24, 24].

      (7) Three other issues related to onsets:

      (a) FDR correction typically doesn't allow localisation claims, similarly to cluster inferences:

      Winkler, A. M., Taylor, P. A., Nichols, T. E., & Rorden, C. (2024). False Discovery Rate and Localizing Power (No. arXiv:2401.03554). arXiv. https://doi.org/10.48550/arXiv.2401.03554

      Rousselet, G. A. (2025). Using cluster-based permutation tests to estimate MEG/EEG onsets: How bad is it? European Journal of Neuroscience, 61(1), e16618. https://doi.org/10.1111/ejn.16618

      (b) Percentile bootstrap confidence intervals are inaccurate when applied to means. Alternatively, use a bootstrap-t method, or use the pb in conjunction with a robust measure of central tendency, such as a trimmed mean.

      Rousselet, G. A., Pernet, C. R., & Wilcox, R. R. (2021). The Percentile Bootstrap: A Primer With Step-by-Step Instructions in R. Advances in Methods and Practices in Psychological Science, 4(1), 2515245920911881. https://doi.org/10.1177/2515245920911881

      (c) Defining onsets based on an arbitrary "at least 30 ms" rule is not recommended:

      Piai, V., Dahlslätt, K., & Maris, E. (2015). Statistically comparing EEG/MEG waveforms through successive significant univariate tests: How bad can it be? Psychophysiology, 52(3), 440-443. https://doi.org/10.1111/psyp.12335

      (8) Figure 5 and matching analyses: There are much better tools than correlations to estimate connectivity and directionality. See for instance:

      Ince, R. A. A., Giordano, B. L., Kayser, C., Rousselet, G. A., Gross, J., & Schyns, P. G. (2017). A statistical framework for neuroimaging data analysis based on mutual information estimated via a Gaussian copula. Human Brain Mapping, 38(3), 1541-1573. https://doi.org/10.1002/hbm.23471

      (9) Pearson correlation is sensitive to other features of the data than an association, and is maximally sensitive to linear associations. Interpretation is difficult without seeing matching scatterplots and getting confirmation from alternative robust methods.

    1. Reviewer #3 (Public review):

      Summary:

      S. Keeley & collaborators propose a computational approach to infer time-varying latent variables directly from calcium traces (for instance, obtained with 2p imaging) without the need for deconvolving the traces into spike trains in a preliminary, independent step. Their approach rests on 1 of 3 families of latent models: GPFA, HMM and dynamical systems - which they augment with an observation model that maps latent variables to fluorescence traces. They validate their approach on simulated and real data, showing that the approach improves latent variable inference and model fitting, compared to more traditional approaches (although not directly compared with the 2-step one; see below). They provide a GitHub repository with code to fit their models (which I have not tested).

      Strengths:

      The approach is sound and well-motivated. The authors are specialists in latent variable models. The manuscript is succinct, well-written, and the figures are clear. I particularly liked the diversity of latent models considered, in particular latent models with continuous (GPFA) vs. discrete (HMM) dynamics, which are useful for characterizing different types of neural computations. The validation on both simulated and real data is convincing.

      Weaknesses:

      The main weakness that I see is that the approach is tested only on a single real dataset (odor response dataset). The other model fits are obtained from simulated data. While the results are convincing, it would be useful to see the approach tested on other datasets, for instance, datasets with different brain areas, different behavioral conditions, or different calcium indicators. This would help assess the generality of the approach and its robustness to different experimental conditions.

      The other points below mostly pertain to clarifications and possible extensions of the approach, and to simple model recovery experiments that would help quantify the advantage of the proposed approach over more traditional ones.

      I have a question related to interpretability and diagnosis of model fits. One advantage of the two-step approach: (1) deconvolution => (2) latent variance inference, is that one can inspect the quality of the deconvolution step independently from the latent variable inference step. In the proposed approach, it seems more difficult to diagnose potential problems with model fitting. For instance, if the inferred latent variables are not interpretable, how can one determine whether this is due to a poor choice of latent model (e.g., HMM with too few states), or a poor fit of the observation model (e.g., wrong parameters for the calcium dynamics)? Are there any diagnostic tools that could help identify potential problems with model fitting?

      Could the authors comment on whether their approach allows for instance to compare different forms of latent models (e.g., HMM vs. GPFA) in terms of model evidence, cross-validated log-likelihood or other model comparison metrics? This would be useful to quantitatively determine which type of latent dynamics is more appropriate for a given dataset.

      The HMM part reveals a pretty large number of states, with one state being interpretable (evoked response). Shouldn't we expect a simpler scenario, with 2 states? I know this is a difficult question that is more general and common with HMM approaches, but it would be useful to discuss this point. For instance, would a hierarchical HMM (with a smaller number of "super-states") be more appropriate here?

      While it certainly makes sense that models accounting for the full transformation of latent => spikes => fluorescence data should outperform the two-step (1) deconvolution => (2) latent variance inference approach, the amount of improvement is not clear. A direct comparison (e.g., w/ parameter & model recovery metrics) between the two approaches on simulated data would be useful to quantify the advantage of the proposed approach over more traditional ones.

      It would be useful to discuss the possible extension of the approach to other types of data that are related to neural activity but have different observation models, e.g., voltage imaging, or neuromodulator sensors (e.g., GRAB-NE, dLight, etc). Do the authors see any specific challenges that would arise in these cases and that would need to be addressed in the future (other than changing the Poisson spiking part)?

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors use methylphenidate (MPH) administration after learning a Pavlovian to instrumental transfer (PIT) task to parse decision making from instrumental influences. While the main effects were null, individual differences in working memory ability moderated the tendency of MPH to boost cognitive control in order to override PIT-biased instrumental learning. Importantly, this working memory moderator had symmetrical effects in appetite and aversive conditions, and these patterns replicated within each valence condition across different values of gain/loss (Fig S1c), suggesting a reliable effect that is generalized across instances of Pavlovian influence.

      Strengths:

      The idea of using pharmacological challenge after learning but prior to transfer is a novel technique that highlights the influence of catecholamines on the expression of learning under Pavlovian bias, and importantly it dissociated this decision feature from the learning of stimulus-outcome or action-outcome pairings.

      We thank the reviewer for highlighting the timing of the pharmacological intervention as a strength for this study and for the suggested improvements for clarification.

      Weaknesses:

      While the report is largely straightforward and clearly written, some aspects may be edited to improve the clarity for other readers.

      (1) Theoretical clarity. The authors seem to hedge their bets when it comes to placing these findings within a broader theoretical framework.

      Our findings ask for a revision of theories on how catecholamines are involved in instantiation of Pavlovian biases in decision making. The reviewer rightly notices that we offer three routes to modify current theory to be able to incorporate our findings. Briefly, these routes discuss catecholaminergic modulation of Pavlovian biases (i) through modulation of the putative striatal ‘origin’ of Pavlovian biases, (ii) through top-down control, primarily relying on prefrontal processes, and (ii) a combination of the two, where catecholamines regulate the balance between these striatal and frontal processes.

      Given the systemic nature of the pharmacological manipulation, we cannot dissociate between these three accounts. We believe that discussing these possible explanations enriches our Discussion and strengthens our recommendation in the ultimate paragraph to use pharmacological neuroimaging studies to arbitrate between these options. In the revision, we have made this line of reasoning more clear, in part by adding guiding titles to the Discussion section and adding a summary paragraph in the Discussion (Discussion, page 9-12).

      (2) Analytic clarity: what's c^2?

      C^2 seems a technical pdf conversion error problem: all chi-squares (Χ2) have been converted to C2. This is now corrected in our revision.

      Reviewer #2 (Public review):

      Summary:

      In this study, Geurts et al. investigated the effects of the catecholamine reuptake inhibitor methylphenidate (MPH) on value-based decision making using a combination of aversive and appetitive Pavlovian to Instrumental Transfer (PIT) in a human cohort. Using an elegant behavioural design they showed a valence- and action-specific effects of Pavlovian cues on instrumental responses. Initial analyses show no effect of MPH on these processes. However the authors performed a more in-depth analysis and demonstrated that MPH actually modulates PIT in actionspecific manner depending of individual working memory capacities. The authors interpret that as an effect on cognitive control of Pavlovian biasing of actions and decision making more than an invigoration of motivational biases.

      Strengths:

      A major strength of this study is its experimental design. The elegant combination of appetitive and aversive Pavlovian learning with approach/avoidance instrumental actions allows to precisely investigate the different modulation of value-based decision making depending on the context and environmental stimuli. Important MPH is only administered after Pavlovian and instrumental learning, restricting the effect on PIT performance only. Finally, the use of a placeboontrolled crossover design allows within-comparisons between PIT effect under placebo and MPH and the investigation of the relationships between working memory abilities, PIT and MPH effects.

      We thank the reviewer for highlighting the experimental design as a strength for this study and the suggested improvements for clarification.

      Weaknesses:

      As authors stated in their discussion, this study is purely correlational and their conclusions could be strengthened by the addition of interesting (but time- and resource-consuming) neuroimaging work.

      We employ a pharmacological intervention within a randomized placebo controlled cross-over design, which allows for causal inferences with respect to the placebo-controlled intervention. Thus, the reported interactions of interest include correlations, but these are causally dependent on our intervention.

      Perhaps the reviewer refers to the implications of our findings for hypotheses regarding neural implementation of Pavlovian bias-generation. Indeed, based on our data we are not able to arbitrate between frontal and striatal accounts, due to the systemic nature of the pharmacological intervention. Thus, we agree with the reviewer that neuroimaging (in combination with for example brain stimulation) would be a valuable next step to identify the neural correlates to these pharmacological intervention effects, to dissociate between frontal and striatal basis of the effects. In the revision, as per our reply to reviewer 1, we have made this line of reasoning more clear, in part by adding guiding titles to the Discussion section and adding a summary paragraph in the Discussion (Discussion, page 9-12).

      The originality of this work compared to their previous published work using the same cohort could also be clarified at different stages of the article, as I initially wondered what was really novel. This point is much clearer in the discussion section.

      As recommended, we brought forward parts of the Discussion that clarify the originality of the current experiment to the introduction (page 4/5) and result section (page 8).

      A point which, in my opinion, really requires clarification is when the working memory performance presented in Figure 2B has been determined. Was it under placebo (as I would guess) or under MPH? If it is the former, it would be also interesting to look at how MPH modulates working memory based on initial abilities.

      We now clarified that working memory span was assessed for all participants on Day 2 prior to the start of instrumental training (as illustrated in figure 1A). Importantly, this was done prior to ingestion of the drug or placebo (which subjects received after Pavlovian training, which followed the instrumental training). This design also precludes an assessment of the effects of MPH on working memory capacity.

      A final point is that it could be interesting to also discuss these results, not only regarding dopamine signalling, but also including potential effect of MPH on noradrenaline in frontal regions, considering the known role of this system in modulating behavioural flexibility.

      We indeed focus our Discussion more on dopamine than on noradrenaline. Our revision now also discusses noradrenaline in light of our frontal control hypothesis and the recommendation, in future studies, to use a multi-drug design, incorporating, for example, a session with the drug atomoxetine, which modulates cortical catecholamines, but not striatal dopamine (Discussion, page 12).

      Reviewer #3 (Public review):

      The manuscript by Geurts and colleagues studies the effects of methylphenidate on Pavlovian to instrumental transfer in humans and demonstrates that the effects of the drug depend on the baseline working memory capacity of the participants. The experiment used a well established cognitive task that allows to measure the effects of Pavlovian cues predicting monetary wins and losses on instrumental responding in two different contexts, namely approach and withdraw. By administering the drug after participants went through the instrumental and Pavlovian learning phases of the experiment, the authors limited the effects of the drug to the transfer phase in extinction. This allowed the authors to make inference about the invigorating effects of the cues independently from any learning bias. Moreover, the authors employed a within subject design to study the effect of the drug on 100 participants, which also allows to detect continuous between-subject relationships with covariates such as working memory capacity.

      The study replicates previous findings using this task, namely that appetitive cues promote active responding, and aversive cues promote passive responding in an approach instrumental context, whereas the effect of the cues reverses in a withdraw instrumental context. The results of the methylphenidate manipulation show that the drug decreases the effects of the Pavlovian cues on instrumental responding in participants with low working memory capacity but increases the Pavlovian effects in participants with high working memory capacity. Importantly, in the latter group, methylphenidate increases the invigorating effect of appetitive Pavlovian cues on active approach and aversive Pavlovian cues on active withdrawal as well as the inhibitory effects of aversive Pavlovian cues on active approach and appetitive Pavlovian cues on active withdrawal. These results cannot be explained if catecholamines are just involved in Pavlovian biases by modulating behavioral invigoration driven by the anticipation of reward and punishment in the striatum, as this account can't account for the reversal of the effects of a valence cue on vigor depending on the instrumental context.

      In general, I find the methods of this study very robust and the results very convincing and important. However, I have some concerns:

      We thank the Reviewer for highlighting the robustness of the methods and the importance of the results. We are glad to shortly address the concerns here and have incorporated these in our revision.

      I am not convinced that the inclusion of impulsivity scores in the logistic mixed model to analyze the effects of methylphenidate on PIT is warranted. The authors do not show whether inclusion of this covariate is justified in terms of BIC. Moreover, they include this covariate but do not report the effects. Finally, it is possible that impulsivity is correlated with working memory capacity. In that case, multicollinearity may impact the estimation of the coefficient estimates and may inflate the p-values for the correlated covariates. Are the reported results robust when this factor is not included?

      With regard to the inclusion of impulsivity we first like to mention that this inclusion in our analyses was planned a priori and therefore consistently implemented in the other reports resulting from the overarching study (Froböse et al., 2018; Cook et al., 2019; Rostami Kandroodi et al., 2021), especially the study with regard to which the current report is an e-life research advance (Swart et al., 2017). Moreover, we preregistered both working memory span and impulsivity as potential factors (under secondary measures) that could mediate the effects of catecholamines (see https://onderzoekmetmensen.nl/nl/trial/26989). The inclusion of working memory span was based on evidence from PET imaging studies demonstrating a link with dopamine synthesis capacity (Cools et al., 2008; Landau et al, 2009), whereas the inclusion of trait impulsivity was based on evidence from other PET imaging studies showing a link with dopamine (auto)receptor availability (Buckholtz et al., 2010; Kim et al., 2014; Lee et al., 2009; Reeves et al., 2012). Although there was no significant improvement for the model with impulsivity compared with the model without impulsivity, we feel that we should follow our a priori established analyses.

      We can confirm that impulsivity and working memory were not correlated in this sample (r98=-0.16, p=0.88), which rules out multicollinearity.

      Most importantly, results are robust to excluding impulsivity scores as evidenced by a significant four-way interaction from the omnibus GLMM without impulsivity (Action Context x Valence x Drug x WM span: X<sup>2</sup> = 9.5, p=0.002). We will report these findings in the revised manuscript. We now added the text to the Supplemental Results: Control analyses, page 28.

      The authors state that working memory capacity is an established proxy for dopamine synthesis capacity and cite some studies supporting this view. However, the authors omit a recent reference by van den Bosch et al that provides evidence for the absence of links between striatal dopamine synthesis capacity and working memory capacity. The lack of a robust link between working memory capacity and dopamine synthesis capacity in the striatum strengthens the alternative explanations of the results suggested in the discussion.

      We agree with the Reviewer that the lack of a robust link between working memory capacity and dopamine synthesis capacity in the striatum, as measured with [<sup>18</sup>F]-FDOPA PET imaging, is lending support for the proposed hypothesis incorporating a broader perspective on Pavlovian bias generation than the dopaminergic direct/indirect pathway account (although it is possible that the association will hold in a larger sample when synthesis capacity is measured with [<sup>18</sup>F]-FMT PET imaging, which is sensitive to a different component of the metabolic pathway). We will indeed incorporate in our planned revision the findings from our group reported in van den Bosch et al (2022).

      See Supplemental methods 2: Working memory and impulsivity assessment, page 26.

      ** Recommendations for the authors:**

      Reviewer #1 (Recommendations for the authors):

      (1) Theoretical clarity. Some aspects of the paper are ideally clear: Figure 1 clearly explains the paradigm. The general take-home message is clearly described in the last line of the abstract, the last line of the introduction, the first line of the discussion, and throughout other places in the discussion. Yet the authors seem to hedge their bets when it comes to placing these findings within a broader theoretical framework.

      The discussion includes many possible theoretical interpretations of the findings, which is laudable, but many readers may get lost in this multitude (particularly anyone who isn't an RL/DA aficionado). The group's prior work (i.e. striatal hypothesis) is first described, followed by a rather complex breakdown of valenceaction tendencies, then the seemingly preferred explanation for the current study (i.e. cognitive control hypothesis) is advanced as "an alternative account ...". This is followed by a third, more complex idea (i.e. cortico-striatal balance hypothesis), then the paper ends. A reader may be forgiven for skimming through this discussion and not having a clear idea of how to frame these effects. I think some subheaders would help, as well as clearer labeling of the theoretical interpretations in line with a more authoritative description of the author's preferred interpretation of the empirical effects.

      Our findings ask for a revision of theories on how catecholamines are involved in instantiation of Pavlovian biases in decision making. The reviewer rightly notices that we offer three routes to modify current theory to be able to incorporate our findings. Briefly, these routes discuss catecholaminergic modulation of Pavlovian biases (i) through modulation of the putative striatal ‘origin’ of Pavlovian biases, (ii) through top-down control, primarily relying on prefrontal processes, and (ii) a combination of the two, where catecholamines regulate the balance between these striatal and frontal processes.

      Given the systemic nature of the pharmacological manipulation, we cannot dissociate between these three accounts. We believe that discussing these possible explanations enriches our Discussion and strengthens our recommendation in the ultimate paragraph to use pharmacological neuroimaging studies to arbitrate between these options. In the revision, we have made this line of reasoning more clear, in part by adding guiding titles to the Discussion section and adding a summary paragraph in the Discussion (Discussion, page 9-12).

      (2) All statistical effects are presented as c^2 with no df. The methods only describe LMER and make no mention of what the c^2 measure represents.

      C^2 seems a technical pdf conversion error problem: all chi-squares (Χ2) have been converted to C2. This is now corrected in our revision.

      Reviewer #2 (Recommendations for the authors):

      Few minor points:

      Figure 2A is not cited in the text I think

      Checked and changed.

      Figure 2C: "C" is not present in the figure. Also I could not see the data corresponding at MPH-Approach context in Neutral Pavlovian condition but I think it is probably masked by another curve.

      Checked and changed. Indeed, the one curve is masked by the other curve.

      As I stated in the public review, a clarification or more detailed analysis of working memory performance depending on if it was measured under MPH or placebo could be a plus.

      Changed this (see public review reply).

      I did not see any statement about the availability of data but I may have missed it.

      Yes, the statement can be found:

      Methods, page 13: Data and code for the study are freely available at https://data.ru.nl/collections/di/dccn/DSC_3017031.02_734.

      Reviewer #3 (Recommendations for the authors):

      The authors should check that inclusion of impulsivity in the logistic mixed model is justified and if it is justified make sure that multicollinearity is not problematic.

      See answer to public review for convenience reiterated below:

      With regard to the inclusion of impulsivity we first like to mention that this inclusion in our analyses was planned a priori and therefore consistently implemented in the other reports resulting from the overarching study (Froböse et al., 2018; Cook et al., 2019; Rostami Kandroodi et al., 2021), especially the study with regard to which the current report is an e-life research advance (Swart et al., 2017). Moreover, we preregistered both working memory span and impulsivity as potential factors (under secondary measures) that could mediate the effects of catecholamines (see https://onderzoekmetmensen.nl/nl/trial/26989). The inclusion of working memory span was based on evidence from PET imaging studies demonstrating a link with dopamine synthesis capacity (Cools et al., 2008; Landau et al, 2009), whereas the inclusion of trait impulsivity was based on evidence from other PET imaging studies showing a link with dopamine (auto)receptor availability (Buckholtz et al., 2010; Kim et al., 2014; Lee et al., 2009; Reeves et al., 2012). Although there was no significant improvement for the model with impulsivity compared with the model without impulsivity, we feel that we should follow our a priori established analyses.

      We can confirm that impulsivity and working memory were not correlated in this sample (r98=-0.16, p=0.88), which rules out multicollinearity.

      Most importantly, results are robust to excluding impulsivity scores as evidenced by a significant four-way interaction from the omnibus GLMM without impulsivity (Action Context x Valence x Drug x WM span: X<sup>2</sup> = 9.5, p=0.002). We will report these findings in the revised manuscript. We now added the text to the Supplemental Results Control analyses, page 28.

      I would recommend that the authors make clear that the effects of methylphenidate are dependent on working memory capacity in the first sentence of the fore last paragraph of the introduction on page 4.

      Changed this accordingly, see Introduction, page 5.

      I would make sure that the text in the figures is readable without needing to enlarge the figures. I would also highlight the significant effects in the figures.

      We changed the font size accordingly and added significance statements to the caption, because depicting the significance of a four-way interaction including one continuous variable is not straightforward.

      The distributions of p(Go) by conditions such as in figure 1D or 2A are very intuitive. Figure 2B is very informative as it shows the continuous effects of working memory capacity on the PIT effect. I would add (in figure 2 or in the supplement) a plot of the p(Go) with a tertile split based on working memory. Considering that the correspondent analysis is being reported, having the plot would strengthen and simplify the understanding of the results.

      The continuous effects of working memory are based on WM values on the listening span ranging from 2.5-7, in steps of 0.5, resulting in 10 different values. A tertile split would result in binning these into two bins of three values, and one bin of four values. Given that all of the datapoints for this tertile split are already presented in the current figures, we strongly prefer not to include this additional figure.

      I would add some sentences in the results section (and maybe in the discussion if needed) addressing the results that the effect of Valence by drug by WM span is only significant in the withdrawal context but not in the approach context.

      We now added an emphasis on the specifically significant drug effects in withdrawal in the Results section, page 8.

    1. But artificial intelligence threatens to become our new “other” – a silent authority that guides our thoughts and actions. We are in danger of ceding the hard-won courage to think for ourselves – and this time, not to gods or kings, but to code.

      Our brains are shrinking; machines are doing the work for us

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This is a valuable polymer model that provides insight into the origin of macromolecular mixed and demixed states within transcription clusters. The well-performed and clearly presented simulations will be of interest to those studying gene expression in the context of chromatin. While the study is generally solid, it could benefit from a more direct comparison with existing experimental data sets as well as further discussion of the limits of the underlying model assumptions.

      We thank the editors for their overall positive assessment. In response to the Referees’ comments, we have addressed all technical points, including a more detailed explanation of the methodology used to extract gene transcription from our simulations and its analogy with real gene transcription. Regarding the potential comparison with experimental data and our mixing–demixing transition, we have added new sections discussing the current state of the art in relevant experiments. We also clarify the present limitations that prevent direct comparisons, which we hope can be overcome with future experiments using the emerging techniques.

      Reviewer #1 (Public Review):

      This manuscript discusses from a theory point of view the mechanisms underlying the formation of specialized or mixed factories. To investigate this, a chromatin polymer model was developed to mimic the chromatin binding-unbinding dynamics of various complexes of transcription factors (TFs).

      The model revealed that both specialized (i.e., demixed) and mixed clusters can emerge spontaneously, with the type of cluster formed primarily determined by cluster size. Non-specific interactions between chromatin and proteins were identified as the main factor promoting mixing, with these interactions becoming increasingly significant as clusters grow larger.

      These findings, observed in both simple polymer models and more realistic representations of human chromosomes, reconcile previously conflicting experimental results. Additionally, the introduction of different types of TFs was shown to strongly influence the emergence of transcriptional networks, offering a framework to study transcriptional changes resulting from gene editing or naturally occurring mutations.

      Overall I think this is an interesting paper discussing a valuable model of how chromosome 3D organisation is linked to transcription. I would only advise the authors to polish and shorten their text to better highlight their key findings and make it more accessible to the reader.

      We thank the Referee for carefully reading our manuscript and recognizing its scientific value. As suggested, we tried to better highlight our key findings and make the text more accessible while addressing also the comments from the other Referees.

      Reviewer #2 (Public Review):

      Summary:

      With this report, I suggest what are in my opinion crucial additions to the otherwise very interesting and credible research manuscript ”Cluster size determines morphology of transcription factories in human cells”.

      Strengths:

      The manuscript in itself is technically sound, the chosen simulation methods are completely appropriate the figures are well-prepared, the text is mostly well-written spare a few typos. The conclusions are valid and would represent a valuable conceptual contribution to the field of clustering, 3D genome organization and gene regulation related to transcription factories, which continues to be an area of most active investigation.

      Weaknesses:

      However, I find that the connection to concrete biological data is weak. This holds especially given that the data that are needed to critically assess the applicability of the derived cross-over with factory size is, in fact, available for analysis, and the suggested experiments in the Discussion section are actually done and their results can be exploited. In my judgement, unless these additional analysis are added to a level that crucial predictions on TF demixing and transcriptional bursting upon TU clustering can be tested, the paper is more fitted for a theoretical biophysics venue than for a biology journal such as eLife.

      We thank the Reviewer for their positive assessment of the soundness of our work and its contribution to the field. We have added a paragraph to the Conclusions highlighting the current state of experimental techniques and outlining near-term experiments that could be extended to test our predictions. We also emphasise that our analysis builds on state-of-the-art polymer models of chromatin and on quantitative experimental datasets, which we used both to build the model construction and to validate its outcomes (gene activity). We hope this strengthened link to experiment will catalyse further studies in the field.

      Major points:

      (1) My first point concerns terminology.The Merriam-Webster dictionary describes morphology as the study of structure and form. In my understanding, none of the analyses carried out in this study actually address the form or spatial structuring of transcription factories. I see no aspects of shape, only size. Unless the authors want to assess actual shapes of clusters, I would recommend to instead talk about only their size/extent. The title is, by the same argument, in my opinion misleading as to the content of this study.

      We agree with the Referee that the title could be misleading. In our study we characterized clusters size, that is a morphological descriptor, and cluster composition that isn’t morphology per se but used in the community in a broader sense. Nevertheless to strength the message we have changed the title in: “Cluster size determines internal structure of transcription factories in human cells”

      (2) Another major conceptual point is the choice of how a single TF:pol particle in the model relates to actual macromolecules that undergo clustering in the cell. What about the fact that even single TF factories still contain numerous canonical transcription factors, many of which are also known to undergo phase separation? Mediator, CDK9, Pol II just to name a few. This alone already represents phase separation under the involvement of different species, which must undergo mixing. This is conceptually blurred with the concept of gene-specific transcription factors that are recruited into clusters/condensates due to sequencespecific or chromatin-epigenetic-specific affinities. Also, the fact that even in a canonical gene with a ”small” transcription factory there are numerous clustering factors takes even the smallest factories into a regime of several tens of clustering macromolecules. It is unclear to me how this reality of clustering and factory formation in the biological cell relates to the cross-over that occurs at approximately n=10 particles in the simulations presented in this paper.

      This is a good point. However in our case we can either look at clustering transcription factors or transcription units. In an experimental situation, transcription units could be “coloured”, or assigned different types, by looking at different cell types, so that they can be classified as housekeeping, or cell-type independent, or cell-type specific. This is similar to how DHS can be clustered. In this way the mixing or demixing state can be identified by looking at the type of transcription unit, removing any ambiguity due to the fact that the same protein may participate in different TF complexes..

      (3) The paper falls critically short in referencing and exploiting for analysis existing literature and published data both on 3D genome organization as well as the process of cluster formation in relation to genomic elements. In terms of relevant literature, most of the relevant body of work from the following areas has not been included:

      (i) mechanisms of how the clustering of Pol II, canonical TFs, and specific TFs is aided by sequence elements and specific chromatin states

      (ii) mechanisms of TF selectivity for specific condensates and target genomic elements

      (iii) most crucially, existing highly relevant datasets that connect 3D multi-point contacts with transcription factor identity and transcriptional activity, which would allow the authors to directly test their hypotheses by analysis of existing data

      Here, especially the data under point (iii) are essential. The SPRITE method (cited but not further exploited by the authors), even in its initial form of publication, would have offered a data set to critically test the mixing vs. demixing hypothesis put forward by the authors. Specifically, the SPRITE method offers ordered data on k-mers of associated genomic elements. These can be mapped against the main TFs that associate with these genomic elements, thereby giving an account of the mixed / demixed state of these k-mer associations. Even a simple analysis sorting these associations by the number of associated genomic elements might reveal a demixing transition with increasing association size k. However, a newer version of the SPRITE method already exists, which combines the k-mer association of genomic elements with the whole transcriptome assessment of RNAs associated with a particular DNA k-mer association. This can even directly test the hypotheses the authors put forward regarding cluster size, transcriptional activation, correlation between different transcription units’ activation etc.

      To continue, the Genome Architecture Mapping (GAM) method from Ana Pombo’s group has also yielded data sets that connect the long-range contacts between gene-regulatory elements to the TF motifs involved in these motifs, and even provides ready-made analyses that assess how mixed or demixed the TF composition at different interaction hubs is. I do not see why this work and data set is not even acknowledged? I also strongly suggest to analyze, or if they are already sufficiently analyzed, discuss these data in the light of 3D interaction hub size (number of interacting elements) and TF motif composition of the involved genomic elements.

      Further, a preprint from the Alistair Boettiger and Kevin Wang labs from May 2024 also provides direct, single-cell imaging data of all super-enhancers, combined with transcription detection, assessing even directly the role of number of super-enhancers in spatial proximity as a determinant of transcriptional state. This data set and findings should be discussed, not in vague terms but in detailed terms of what parts of the authors’ predictions match or do not match these data.

      For these data sets, an analysis in terms of the authors’ key predictions must be carried out (unless the underlying papers already provide such final analysis results). In answering this comment, what matters to me is not that the authors follow my suggestions to the letter. Rather, I would want to see that the wealth of available biological data and knowledge that connects to their predictions is used to their full potential in terms of rejecting, confirming, refining, or putting into real biological context the model predictions made in this study.

      References for point (iii):

      - RNA promotes the formation of spatial compartments in the nucleus https://www.cell.com/cell/fulltext/S0092-8674(21)01230-7?dgcid=raven_jbs_etoc_email

      - Complex multi-enhancer contacts captured by genome architecture mapping https://www.nature.com/articles/nature21411

      - Cell-type specialization is encoded by specific chromatin topologies https://www.nature.com/articles/s41586-021-04081-2

      - Super-enhancer interactomes from single cells link clustering and transcription https://www.biorxiv.org/content/10.1101/2024.05.08.593251v1.full

      For point (i) and point (ii), the authors should go through the relevant literature on Pol II and TF clustering, how this connects to genomic features that support the cluster formation, and also the recent literature on TF specificity. On the last point, TF specificity, especially the groups of Ben Sabari and Mustafa Mirx have presented astonishing results, that seem highly relevant to the Discussion of this manuscript.

      We appreciate the Reviewer’s insightful suggestion that a comparison between our simulation results and experimental data would strengthen the robustness of our model. In response, we have thoroughly revised the literature on multi-way chromatin contacts, with particular attention to SPRITE and GAM techniques. However, we found that the currently available experimental datasets lack sufficient statistical power to provide a definitive test of our simulation predictions, as detailed below.

      As noted by the Reviewer, SPRITE experiments offer valuable information on the composition of highorder chromatin clusters (k-mers) that involve multiple genomic loci. A closer examination of the SPRITE data (e.g., Supplementary Material from Ref. [1]) reveals that the majority of reported statistics correspond to 3-mers (three-way contacts), while data on larger clusters (e.g., 8-mers, 9-mers, or greater) are sparse. This limitation hinders our ability to test the demixing-mixing transition predicted in our simulations, which occurs for cluster sizes exceeding 10.

      Moreover, the composition of the k-mers identified by SPRITE predominantly involves genomic regions encoding functional RNAs—such as ITS1 and ITS2 (involved in rRNA synthesis) and U3 (encoding small nucleolar RNA)—which largely correspond to housekeeping genes. Conversely, there is little to no data available for protein-coding genes. This restricts direct comparison to our simulations, where the demixing-mixing transition depends critically on the interplay between housekeeping and tissue-specific genes.

      Similarly, while GAM experiments are capable of detecting multi-way chromatin contacts, the currently available datasets primarily report three-way interactions [2,3].

      In summary, due to the limited statistical data on higher-order chromatin clusters [4], a quantitative comparison between our simulation results and experimental observations is not currently feasible. Nevertheless, we have now briefly discussed the experimental techniques for detecting multi-way interactions in the revised manuscript to reflect the current state of the field, mentioning most of the references that the Reviewer suggested.

      (4) Another conceptual point that is a critical omission is the clarification that there are, in fact, known large vs. small transcription factories, or transcriptional clusters, which are specific to stem cells and ”stressed cells”. This distinction was initially established by Ibrahim Cisse’s lab (Science 2018) in mouse Embryonic Stem Cells, and also is seen in two other cases in differentiated cells in response to serum stimulus and in early embryonic development:

      - Mediator and RNA polymerase II clusters associate in transcription-dependent condensates https://www.science.org/doi/10.1126/science.aar4199

      - Nuclear actin regulates inducible transcription by enhancing RNA polymerase II clustering https://www.science.org/doi/10.1126/sciadv.aay6515

      - RNA polymerase II clusters form in line with surface condensation on regulatory chromatin https://www.embopress.org/doi/full/10.15252/msb.202110272

      - If ”morphology” should indeed be discussed, the last paper is a good starting point, especially in combination with this additional paper: Chromatin expansion microscopy reveals nanoscale organization of transcription and chromatin https://www.science.org/doi/10.1126/science.ade5308

      We thank the Reviewer for pointing out the discussion about small and large clusters observed in stressed cells. Our study aims to provide a broader mechanistic explanation on the formation of TF mixed and demixed clusters depending on their size. However, to avoid to generate confusion between our terminology and the classification that is already used for transcription factories in stem and stressed cells, we have now added some comments and references in the revised text.

      (5) The statement scripts are available upon request is insufficient by current FAIR standards and seems to be non-compliant with eLife requirements. At a minimum, all, and I mean all, scripts that are needed to produce the simulation outcomes and figures in the paper, must be deposited as a publicly accessible Supplement with the article. Better would be if they would be structured and sufficiently documented and then deposited in external repositories that are appropriate for the sharing of such program code and models.

      We fully agree with the Reviewer. We have now included in the main text a link to an external repository containing all the codes required to reproduce and analyze the simulations.

      Recommendations for the authors:

      Minor and technical points

      (6) Red, green, and yellow (mix of green and red) is a particularly bad choice of color code, seeing that red-green blindness is the most common color blindness. I recommend to change the color code.

      We appreciate the Reviewer’s thoughtful comment regarding color accessibility. We fully agree that red–green combinations can pose challenges for color-blind readers. In our figures, however, we chose the red–green–yellow color scheme deliberately because it provides strong contrast and intuitive representation for different TF/TU types. To ensure accessibility, we optimized brightness and saturation within red-green schemes and we carefully verified that the chosen hues are distinguishable under the most common forms of color vision deficiency, i.e. trichromatic color blindness, using color-blindness simulation tools (e.g., Coblis).

      How is the dispersing effect of transcriptional activation and ongoing transcription accounted for or expected to affect the model outcome? This affects both transcriptional clusters (they tend to disintegrate upon transcriptional activation) as well as the large scale organization, where dispersal by transcription is also known.

      We thank the Reviewer for this very insightful question. The current versions of both our toy model and the more complex HiP-HoP model do not incorporate the effects of RNA Polymerase elongation. Our primary goal was to develop a minimalisitc framework that focuses on investigating TF clusters formation and their composition. Nevertheless, we find that this straightforward approach provides a good agreement between simulations and Hi-C and GRO-seq experiments, lending confidence to the reliability of our results concerning TF cluster composition.

      We fully agree, however, that the effects of transcription elongation are an interesting topic for further exploration. For example, modeling RNA Polymerases as active motors that continually drive the system out of equilibrium could influence the chromatin polymer conformation and the structure of TF clusters. Additionally, investigating how interactions between RNA molecules and nuclear proteins, such as SAF-A, might lead to significant changes in 3D chromatin organization and, consequently, transcription [5], is also in intriguing prospect. Although we do not believe that the main findings of our study, particularly regarding cluster composition and mixed-demixed transition, would be impacted by transcription elongation effects, we recognize the importance of this aspect. As such, we have now included some comments in the Conclusions section of the revised manuscript.

      “and make the reasonable assumption that a TU bead is transcribed if it lies within 2.25 diameters (2.25σ) of a complex of the same colour; then, the transcriptional activity of each TU is given by the fraction of time that the TU and a TF:pol lie close together.” How is that justified? I do not see how this is reasonable or not, if you make that statement you must back it up.

      As pointed out by the Referee, we consider a TU to be active if at least one TF is within a distance 2.25σ from that TU. This threshold is a slightly larger than the TU-TF interaction cutoff distance, r<sub>c</sub> \= 1.8σ between TFs and TUs. The rationale for this choice is to ensure that, in the presence of a TU cluster surrounded by TFs, TUs that are not directly in contact with a TF are still considered active. Nonetheless, we find that using slightly different thresholds, such as 1.8σ or 1.1σ, leads to comparable results, as shown in Fig. S11, demonstrating the robustness of our analysis.

      Clearly, close proximity in 1D genomic space favours formation of similarly-coloured clusters. This is not surprising, it is what you built the model to do. Should not be presented as a new insight, but rather as a check that the model does what is expected.

      We believed that this sentence already conveyed that the formation of single-color clusters driven by 1D genomic proximity is not a surprising outcome. However, we have now slightly rephrased it to better emphasize that this is not a novel insight.

      That said, we would like to highlight that while 1D genomic proximity facilitates the formation of clusters of the same color, the unmixed-to-mixed transition in cluster composition is not easily predictable solely from the TU color pattern. Furthermore, in simulations of real chromosomes, where TU patterns are dictated by epigenetic marks, the complexity of these patterns makes it challenging—if not impossible—to predict cluster composition based solely on the input data of our model.

      “…how closely transcriptional activities of different TUs correlate…” Please briefly state over what variable the correlation is carried out, is it cross correlation of transcription activity time courses over time? Would be nice to state here directly in the main text to make it easier for the reader.

      We have now included a brief description in the revised manuscript explaining how the transcriptional correlations were evaluated and how the correlation matrix was constructed.

      “The second concerns how expression quantitative trait loci (eQTLs) work. Current models see them doing so post-transcriptionally in highly-convoluted ways [11, 55], but we have argued that any TU can act as an eQTL directly at the transcriptional level [11].” This text does not actually explain what eQTLs do. I think it should, in concise words.

      We agree with the Referee’s suggestion. We have revised the sentence accordingly and now provide a clear explanation of eQTLs upon their first mention. The revised paragraph now reads as follows:

      “The second concerns how expression quantitative trait loci (eQTLs)—genomic regions that are statistically associated with variation in gene expression levels—function. While current models often attribute their effects to post-transcriptional regulation through complex mechanisms [6,7], we have previously argued that any transcriptional unit (TU) can act as an eQTL by directly influencing gene expression at the transcriptional level [7]. Here, we observe individual TUs up-regulating or down-regulating the activity of others TUs – hallmark behaviors of eQTLs that can give rise to genetic effects such as “transgressive segregation” [8]. This phenomenon refers to cases in which alleles exhibit significantly higher or lower expression of a target gene, and can be, for instance, caused by the creation of a non-parental allele with a specific combination of QTLs with opposing effects on the target gene.”

      “In the string with 4 mutations, a yellow cluster is never seen; instead, different red clusters appear and disappear (Fig. 2Eii)…” How should it be seen? You mutated away most of the yellow beads. I think the kymograph is more informative about the general model dynamics, not the effects of mutations. Might be more appropriate to place a kymograph in Figure 1.

      We agree with the Referee that the kymograph is the most appropriate graphical representation for capturing the effects of mutations. Panel 2E already refers to the standard case shown in Figure 1. We have now clarified this both in the caption and in the main text. In addition, we have rephrased the sentence—which was indeed misleading—as follows:

      “From the activity profiles in Fig. 2C, we can observe that as the number of mutations increases, the yellow cluster is replaced by a red cluster, with the remaining yellow TUs in the region being expelled (Fig. 2B(ii)). This behavior is reflected in the dynamics, as seen by comparing panels E(i) and E(ii): in the string with four mutations, transcription of the yellow TUs is inhibited in the affected region, while prominent red stripes—corresponding to active, transcribing clusters—emerge (Fig. 2E(ii)).” We hope that the comparison is now immediately clear to the reader.

      “…but this block fragments in the string with 4 mutations…” I don’t know or cannot see what is meant by ”fragmentation” in the correlation matrix.

      With the sentence “this block fragments in the string with 4 mutations” we mean that the majority of the solid red pixels within the black box become light-red or white once the mutations are applied. We have now added a clarification of this point in the revised manuscript.

      “Fig. 3D shows the difference in correlation between the case with reduced yellow TFs and the case displayed in Fig. 1E.” Can you just place two halves of the different matrices to be compared into the same panel? Similar to Fig. S5. Will be much easier to compare.

      We thank the Referee for this suggestion. We tried to implement this modification, and report the modified figure below (Author response image 1). As we can see, in the new figure it is difficult to spot the details we refer to in the main text, therefore we prefer to keep the original version of the figure.

      Author response image 1.

      Heatmap comparing activity correlations of TUs in the random string under normal conditions (top half) and with reduced yellow-TF concentration (bottom half).

      What is the omnigenic model? It is not introduced.

      We thank the Reviewer for highlighting this important point. The omnigenic model, first introduced by Boyle et al in Ref. [6], was proposed to explain how complex traits, including disease risk, are influenced by a vast number of genes. Accordingly to this model, the genetic basis of a trait is not limited to a small set of core genes whose expression is directly related to the trait, but also includes peripheral genes. The latter, although not directly involved in controlling the trait, can influence the expression of core genes through gene regulatory networks, thereby contributing to the overall genetic influence on the trait. We have now added a few lines in the revised manuscript to explain this point.

      “Additionally, blue off-diagonal blocks indicate repeating negative correlations that reflect the period of the 6-pattern.” How does that look in a kymograph? Does this mean the 6 clusters of same color steal the TFs from the other clusters when they form?

      The intuition of the Referee is indeed correct. The finite number of TFs leads to competition among TUs of the same colour, resulting in anticorrelation:when a group of six nearby TUs of a given colour is active, other, more distant TUs of the same colour are not transcribing due to the lack of available TFs. As the Referee suggested,this phenomenon is visible in the kymograph showing TU activity. In Author response image 2, it can be observed that typically there is a single TU cluster for each of the three colours (yellow, green, and red). These clusters can be long-lived (e.g., the yellow cluster at the center of the kymograph) or may destroy during the simulation (e.g., the red cluster at the top of the kymograph, which dissolves at t ∼ 600 × 10<sup>5</sup> τ<sub>B</sub>). In the latter case, TFs of the corresponding colour are released into the system and can bind to a different location, forming a new cluster (as seen with the red cluster forming at the bottom of the kymograph for t > 600 × 10<sup>5</sup> τ<sub>B</sub>). This point is further discussed at the point 2.30 of this Reply where additional graphical material is provided.

      Author response image 2.

      Kymograph showing the TU activity during a typical run in the 6-pattern case. Each row reports the transcriptional state of a TU during one simulation. Black pixels correspond to inactive TUs, red (yellow, green) pixels correspond to active red (yellow, green) TUs.

      “Conversely, negative correlations connect distant TUs, as found in the single-color model…” But at the most distal range, the negative correlation is lost again! Why leave this out? Your correlation curves show the same , equilibration towards no correlation at very long ranges.

      As highlighted in Figure 5Ai, long-range negative correlations (grey segments) predominantly connect distant TUs of the same colour. This is quantified in Figure 5Bi: restricting to same-colour TUs shows that at large genomic separations the correlation is almost entirely negative, with small fluctuations at distances just below 3000 kbp where sampling is sparse; we therefore avoid further interpretation of this regime.

      “These results illustrate how the sequence of TUs on a string can strikingly affect formation of mixed clusters; they also provide an explanation of why activities of human TUs within genomic regions of hundreds of kbp are positively correlated [60].” This is a very nice insight.

      We thank the Reviewer for the very supportive comment.

      “To quantify the extent to which TFs of different colours share clusters, we introduce a demixing coefficient, θ<sub>dem</sub> (defined in Fig. 1).” This is not defined in Fig. 1 or anywhere else here in the main text.

      We thank the Referee for pointing this out. For a given cluster, the demixing coefficient is defined as

      where n is the number of colors, i indexes each color present in the model, and x<sub>i,max</sub> the largest fraction of TFs of the same i-th color in a single TF cluster.

      The demixing coefficient is defined in the Methods section; therefore, we have replaced defined in Fig. 1 with see Methods for definition.

      “Mixing is facilitated by the presence of weakly-binding beads, as replacing them with non-interacting ones increases demixing and reduces long-range negative correlations (Figure S3). Therefore, the sequence of strong and weak binding sites along strings determines the degree of mixing, and the types of small-world network that emerge. If eQTLs also act transcriptionally in the way we suggest [11], we predict that down-regulating eQTLs will lie further away from their targets than up-regulating ones.” Going into these side topics and minke points here is super distracting and waters down the message. Maybe first deal with the main conclusions on mixed vs demixed clusters in dependence on the strong and specific binding site patterns, before dealing with other additional points like the role of weak binding sites.

      Thank you for the suggestion. We now changed the paragraph to highlight the main results. The new paragraph is as follows. “These results on activity correlation and TF cluster composition suggest that, if eQTLs act transcriptionally as expected [7], down-regulating eQTLs are likely to be located further from their target genes than up-regulating ones. In addition, it is important to note that mixing is promoted by the presence of weakly binding beads; replacing these with non-interacting ones leads to increased demixing and a reduction in long-range negative correlations (Figure S3). More generally, our findings indicate that the presence of multiple TF colors offers an effective mechanism to enrich and fine-tune transcriptional regulation.”

      “…provides a powerful pathway to enrich and modulate transcriptional regulation.” Before going into the possible meaning and implications of the results, please discuss the results themselves first.

      See previous point.

      Figure 5B. Does activation typically coincide with spatial compaction of the binding sites into a small space or within the confines of a condensate? My guess would be that colocalization of the other color in a small space is what leads to the mixing effect?

      As the Reviewer correctly noted, the activity of a given TU is indeed influenced by the presence of nearby TUs of the same color, since their proximity facilitates the recruitment of additional TFs and enhances the overall transcriptional activity. In this context, the mixing effect is certainly affected by the 1D arrangement of TUs along the chromatin fiber. As emphasized in the revised manuscript, when domains of same-color TUs are present (as in the 6-pattern string), the degree of demixing is greater compared to the case where TUs of different colors alternate and large domains are absent (as in the 1-pattern string). This difference in the demixing parameter as a function of the 1D TU arrangement is clearly visible in Fig. S2B.

      “…euchromatic regions blue, and heterochromatic ones grey.” Please also explain what these color monomers mean in terms of non specific interactions with the TFs.

      Generally, in our simulation approach we assume euchromatin regions to be more open and accessible to transcription factors, whereas heterochromatin corresponds to more compacted chromatin segments [9]. To reflect this, we introduce weak, non-specific interactions between euchromatin and TFs, while heterochromatin interacts with TFs only thorugh steric effects. To clarify this point, we have now slightly revised the caption of Fig.6.

      “More quantitatively, Spearman’s rank correlation coefficient is 3.66 10<sup>−1</sup>, which compares with 3.24 10<sup>−1</sup> obtained previously using a single-colour model [11].” This comparison does not tell me whether the improvement in model performance justifies an additional model component. There are other, likelihood based approaches to assess whether a model fits better in a relevant extent by adding a free model parameter. Can these be used for a more conclusive comparison? Besides, a correlation of 0.36 does not seem so good?

      We understand the Reviewer’s concern that the observed increase in the activity correlation may not appear to provide strong evidence for the improvement of the newly introduced model. However, within the context of polymer models developed to study realistic gene transcription and chromatin organization, this type of correlation analysis is a widely accepted approach for model validation. Experimental data commonly used for such validation include Hi-C maps, FISH experiments, and GRO-seq data [10,11]. The first two are typically employed to assess how accurately the model reproduces the 3D folding of chromatin; a comparison between experimental and simulated Hi-C maps is provided in the Supplementary Information (Fig. S5), showing a Pearson correlation of 0.7. GRO-seq or RNA-seq data, on the other hand, are used to evaluate the model’s ability to predict gene transcription levels. To date, the highest correlation for transcriptional activity data has been achieved by the HiP-HoP model at a resolution of 1 kbp [10], reporting a Spearman correlation of 0.6. Therefore, the correlation obtained with our 2-color model represents a good level of agreement when compared with the more complex HiP-HoP model. In this context, the observed increase in correlation—from 0.324 to 0.366—can be regarded as a modest yet meaningful improvement.

      “…consequently, use of an additional color provides a statisticallysignificant improvement (p-value < 10<sup>−6</sup>, 2-sided t-test).” I do not follow this argument. Given enough simulation repeats, any improvement, no matter how small, will lead to statistically significant improvements.

      We agree that this sentence could be misleading. We have now rephrased it in a clearer manner specifying that each of the two correlation values is statistically significant alone, while before we were wrongly referring to the significance of the improvement.

      “Additionally, simulated contact maps show a fair agreement with Hi-C data (Figure S5), with a Pearson correlation r ∼ 0.7 (p-value < 10<sup>−6</sup>, 2-sided t-test).” Nice!

      We thank the Reviewer for the positive comment.

      “Because we do not include heterochromatin-binding proteins, we should not however expect a very accurate reproduction of Hi-C maps: we stress that here instead we are interested in active chromatin, transcription and structure only as far as it is linked to transcription.” Then why do you not limit your correlation assessment to only these regions to show that these are very well captured by your model?

      We thank the Reviewer for this insightful comment. Indeed, we could have restricted our investigation to active chromatin regions, as done in our previous works [11,12]. However, our intention in this section of the manuscript was to clarify that the current model is relatively simple and therefore not expected to achieve a very high level of agreement between experimental and simulated Hi-C maps. Another important limitation of the two color model described in the section is the absence of active loop extrusion mediated by SMC proteins, which is known to play a central role in establishing TADs boundaries. Consequently, even if our analysis were limited to active chromatin regions, the agreement with experimental Hi-C maps would still remain lower than that obtained with more comprehensive models, such as HiP-HoP, that we use later in the last section of the paper. We have now added a comment in the revised manuscript explicitly noting the lack of active loop extrusion in our 2-color model.

      “We also measure the average value of the demixing coefficient, θ<sub>dem</sub> (Materials and Methods). If θ<sub>dem</sub> = 1, this means that a cluster contains only TFs of one colour and so is fully demixed; if θ<sub>dem</sub> = 0, the cluster contains a mixture of TFs of all colors in equal number, and so is maximally mixed.” Repetitive.

      We have now rephrased the sentence in a more concise way.

      “…notably, this is similar to the average number of productivelytranscribing pols seen experimentally in a transcription factory [6].” That seems a bit fast and loose. The number of Polymerases can differ depending on state, type of factory, gene etc. and vary between anything from to a few hundreds of Polymerase complexes depending on definition of factory, and what is counted as active. Also, one would think that polymerases only make up a small part of the overall protein pool that constitutes a condensate, so it is unclear whether this is a pertinent estimate.

      Here we refer to the average size of what is normally referred to as a PolII factory, not a generic nuclear condensate. These are the clusters which arise in our simulations. These structures emerge through microphase separation and have been well characterised, for instance see [13] for a recent review. For these structures while there is a distribution the average is well defined and corresponds to a size of about 100 nm, which is very much in line with the size of the clusters we observe, both in terms of 3D diameter and number of participating proteins. Because of the size, the number of active complexes which can contribute cannot be significantly more than ∼ 10. These estimates are, we note, very much in line with super-resolution measurements of SAF-A clusters [14], which are associated with active transcription and hence it is reasonable to assume they colocalise with RNA and polymerase clusters.

      “Conversely, activities of similar TUs lying far from each other on the genetic map are often weakly negatively correlated, as the formation of one cluster sequesters some TFs to reduce the number available to bind elsewhere.” This point is interesting, and I strongly suspect that this indeed happening. But I don’t think it was shown in the analysis of the simulation results in sufficient clarity. We need direct assessment of this sequestration, currently it’s only indirectly inferred.

      Indeed, this is the mechanism underlying the emergence of negative long-range correlations among TU activity values. As the Reviewer correctly pointed out, the competition for a finite number of TFs was only indirectly inferred in the original manuscript. To address this, we have now included a new figure explicitly illustrating this effect. In Fig. S12, we show the kymograph of active TUs (left panel), as in Fig. 2E(i) of the main text, alongside a new kymograph depicting the number of green TFs within a sphere of radius 10σ centered on each green TU (right panel). For simplicity, we focus here only on green TUs and TFs. It can be observed that, during the initial part of the simulation, green TFs are localized near genomic position ∼ 2000(right panel), where green TUs are transcriptionally active (left panel). Toward the end of the simulation, TUs near genomic position ∼ 500 become active, coinciding with the relocation of TFs to this region and the depletion of the previous one.

      In the definition for the demixing coefficient (equation 1), what does the index i stand for?

      Here i is an index denoting each of the colors present in the model. We have now specified the meaning of i after Eq. 1.

      Reviewer 3 (Public Review):

      In this work, the authors present a chromatin polymer model with some specific pattern of transcription units (TUs) and diffusing TFs; they simulate the model and study TFclustering, mixing, gene expression activity, and their correlations. First, the authors designed a toy polymer with colored beads of a random type, placed periodically (every 30 beads, or 90kb). These colored beads are considered a transcription unit (TU). Same-colored TUs attract with each other mediated by similarly colored diffusing beads considered as TFs. This led to clustering (condensation of beads) and correlated (or anti-correlation) ”gene expression” patterns. Beyond the toy model, when authors introduce TUs in a specific pattern, it leads to emergence of specialized and mixed cluster of different TFs. Human chromatin models with realistic distribution of TUs also lead to the mixing of TFs when cluster size is large.

      Strengths.

      This is a valuable polymer model for chromatin with a specific pattern of TUs and diffusing TF-like beads. Simulation of the model tests many interesting ideas. The simulation study is convincing and the results provide solid evidence showing the emergence of mixed and demixed TF clusters within the assumptions of the model.

      Weaknesses.

      Weakness of the work: The model has many assumptions. Some of the assumptions are a bit too simplistic. Concerns about the work are detailed below:

      We thank the Referee for this overall positive evaluation.

      We thank the Referee for this important observation. The way we The authors assume that when the diffusing beads (TFs) are near a TU, the gene expression starts. However, mammalian gene expression requires activation by enhancer-promoter looping and other related events. It is not a simple diffusion-limited event. Since many of the conclusions are derived from expression activity, will the results be affected by the lack of looping details?

      We do not need to assume promoter-enhancer contact, this emerges naturally through the bridging-induced phase separation and indeed is a key strength of our model. Even though looping is not assumed as key to transcriptional initiation, in practice the vast majority of events in which a TF is near a TU are associated with the presence of a cluster where regulatory elements are looped. So transcription in our case is associated with the bridging-induced phase separation, and there is no lack of looping, looping is naturally associated with transcription, and this is an emergent property of the model (not an assumption), which is an important feature of our model. Accordingly, both contact maps and transcriptional activity are well predicted by our model, both in the version described here and in the more sophisticated single-colour HiP-HoP model [10] (an important ingredient of which is the bridging-induced phase separation).

      Authors neglect protein-protein interactions. Without proteinprotein interactions, condensate formation in natural systems is unlikely to happen.

      We thank the Reviewer for pointing out the absence of protein-protein interactions in our simulations. While we acknowledge this limitation, we would like to emphasize that experimental studies have not observed nuclear proteins forming condensates at physiological concentrations in the absence of DNA or chromatin. For example, studies such as Ryu et al. [15] and Shakya et al. [16] show that protein-protein interactions alone are insufficient to drive condensate formation in vivo. Instead, the presence of a substrate, such as DNA or chromatin, is essential to favor and stabilize the formation of protein clusters.

      In our simulations, we propose that protein liquid-liquid phase separation (LLPS) is driven by the presence of both strong and weak attractions between multivalent protein complexes and the chromatin filament. As stated in our manuscript, the mechanism leading to protein cluster formation is the bridging induced attraction. This mechanism involves a positive feedback loop, where protein binding to chromatin induces a local increase in chromatin density, which then attracts more proteins, further promoting cluster formation.

      While we acknowledge that adding protein-protein interactions could be incorporated into our simulations, we believe this would need to be a weak interaction to remain consistent with experimental data. Additionally, incorporating such interactions would not alter the conclusions of our study.

      What is described in this paper is a generic phenomenon; many kinds of multivalent chromatin-binding proteins can form condensates/clusters as described here. For example, if we replace different color TUs with different histone modifications and different TFs with Hp1, PRC1/2, etc, the results would remain the same, wouldn’t they? What is specific about transcription factor or transcription here in this model? What is the logic of considering 3kb chromatin as having a size of 30 nm? See Kadam et al. (Nature Communications 2023). Also, DNA paint experimental measurement of 5kb chromatin is greater than 100 nm (see work by Boettiger et al.).

      We thank the Reviewer for this important observation, which we now address. To begin, we consider the toy model introduced in the first part of the manuscript, where TUs are randomly positioned rather than derived from epigenetic data. As the Reviewer points out, in this simplified context, our results reflect a generic phenomenon: the composition of clusters depends primarily on their size, independent of the specific types of proteins involved. However, the main goal of our work is to gain insights into apparently contradictory experimental findings, which show that some transcription factories consist of a single type of transcription factors, while other contain multiple types. This led us to focus on TF clusters and their role in transcriptional regulation and co-regulation of distant genes. Therefore, in the second part of the manuscript, we use DNase I hypersensitive site (DHS) data to position TUs based on predicted TF binding sites, providing a more biological framework. In both the toy model and the more realistic HiP-HoP model, we observe a size-dependent transition in cluster composition. However, we refrain from generalizing these results to clusters composed of other protein complexes, such as HP1 and PRC, as their binding is governed by distinct epigenetic marks (e.g. H3K927me3 and H3K27me3), which exhibit different genomic distributions compared to DHS marks.

      Finally, the mapping of 3kb to 30nm is an estimate which does not significantly impact our conclusions. The relationship between genomic distance (in kbp) and spatial distance (in nm) is highly dependent on the degree of chromatin compaction, which can vary across cell types and genomic context. As such, providing an exact conversion is challenging [17]. For example, in a previous work based on the HiP-HoP model [12] we compared simulated and experimental FISH measurements and found that 1kbp typically corresponds to 15 − 20nm, implying that 3kbp could span 60nm. Nevertheless, we emphasize that varying this conversion factor does not affect the core results or conclusions of our study. We have now included a clarification in the revised SI to highlight this point.

      Recommendations for the authors:

      Other points.

      Figure 1(D) caption says 2.25σ = 1.6 nanometer. Is this a typo? Sigma is 30nm.

      Yes, it was. As 1σ ∼ 30nm, we have 2.25σ = 2.25 · 30 nm = 67.2 nm ∼ 6.7 × 10<sup>−8</sup>m. We have now corrected the caption.

      Page 6, column 2nd, 3rd para, it is written that θ<sub>dem</sub> (”defined in Fig.1”). There is no θ<sub>dem</sub> defined in Fig.1, is there? I can see it defined in Methods but not in Fig. 1.

      Correct, we replaced (defined in Fig.1) with (see Methods for definition).

      Page 6, column 2, 4th para: what does “correlations overlap and correlations diverge mean”?

      With reference to the plots from Fig. 5B, correlation overlap and diverge simply refers to the fact that same-colour (red curves) and different-colour (blue curves) correlation trends may or may not overlap on each other. We have now clarified this point.

      What is the precise definition of correlation in Fig 5B (Y-axis)?

      In Fig.5B, correlation means Pearson correlation. We have now specified this point in the revised text and in the caption of Fig.5.

      References

      (1) S. A. Quinodoz, J. W. Jachowicz, P. Bhat, N. Ollikainen, A. K. Banerjee, I. N. Goronzy, M. R. Blanco, P. Chovanec, A. Chow, Y. Markaki et al., “Rna promotes the formation of spatial compartments in the nucleus,” Cell, vol. 184, no. 23, pp. 5775–5790, 2021.

      (2) R. A. Beagrie, A. Scialdone, M. Schueler, D. C. Kraemer, M. Chotalia, S. Q. Xie, M. Barbieri, I. de Santiago, L.-M. Lavitas, M. R. Branco et al., “Complex multi-enhancer contacts captured by genome architecture mapping,” Nature, vol. 543, no. 7646, pp. 519–524, 2017.

      (3) R. A. Beagrie, C. J. Thieme, C. Annunziatella, C. Baugher, Y. Zhang, M. Schueler, A. Kukalev, R. Kempfer, A. M. Chiariello, S. Bianco et al., “Multiplex-gam: genome-wide identification of chromatin contacts yields insights overlooked by hi-c,” Nature Methods, vol. 20, no. 7, pp. 1037–1047, 2023.

      (4) L. Liu, B. Zhang, and C. Hyeon, “Extracting multi-way chromatin contacts from hi-c data,” PLOS Computational Biology, vol. 17, no. 12, p. e1009669, 2021.

      (5) R.-S. Nozawa, L. Boteva, D. C. Soares, C. Naughton, A. R. Dun, A. Buckle, B. Ramsahoye, P. C. Bruton, R. S. Saleeb, M. Arnedo et al., “Saf-a regulates interphase chromosome structure through oligomerization with chromatin-associated rnas,” Cell, vol. 169, no. 7, pp. 1214–1227, 2017.

      (6) E. A. Boyle, Y. I. Li, and J. K. Pritchard, “An expanded view of complex traits: from polygenic to omnigenic,” Cell, vol. 169, no. 7, pp. 1177–1186, 2017.

      (7) C. Brackley, N. Gilbert, D. Michieletto, A. Papantonis, M. Pereira, P. Cook, and D. Marenduzzo, “Complex small-world regulatory networks emerge from the 3d organisation of the human genome,” Nat. Commun., vol. 12, no. 1, pp. 1–14, 2021.

      (8) R. B. Brem and L. Kruglyak, “The landscape of genetic complexity across 5,700 gene expression traits in yeast,” Proceedings of the National Academy of Sciences, vol. 102, no. 5, pp. 1572– 1577, 2005.

      (9) M. Chiang, C. A. Brackley, D. Marenduzzo, and N. Gilbert, “Predicting genome organisation and function with mechanistic modelling,” Trends in Genetics, vol. 38, no. 4, pp. 364–378, 2022.

      (10) M. Chiang, C. A. Brackley, C. Naughton, R.-S. Nozawa, C. Battaglia, D. Marenduzzo, and N. Gilbert, “Genome-wide chromosome architecture prediction reveals biophysical principles underlying gene structure,” Cell Genomics, vol. 4, no. 12, 2024.

      (11) A. Buckle, C. A. Brackley, S. Boyle, D. Marenduzzo, and N. Gilbert, “Polymer simulations of heteromorphic chromatin predict the 3d folding of complex genomic loci,” Mol. Cell, vol. 72, no. 4, pp. 786–797, 2018.

      (12) G. Forte, A. Buckle, S. Boyle, D. Marenduzzo, N. Gilbert, and C. A. Brackley, “Transcription modulates chromatin dynamics and locus configuration sampling,” Nature Structural & Molecular Biology, vol. 30, no. 9, pp. 1275–1285, 2023.

      (13) P. R. Cook and D. Marenduzzo, “Transcription-driven genome organization: a model for chromosome structure and the regulation of gene expression tested through simulations,” Nucleic acids research, vol. 46, no. 19, pp. 9895–9906, 2018.

      (14) M. Marenda, D. Michieletto, R. Czapiewski, J. Stocks, S. M. Winterbourne, J. Miles, O. C. Flemming, E. Lazarova, M. Chiang, S. Aitken et al., “Nuclear rna forms an interconnected network of transcription-dependent and tunable microgels,” BioRxiv, pp. 2024–06, 2024.

      (15) J.-K. Ryu, C. Bouchoux, H. W. Liu, E. Kim, M. Minamino, R. de Groot, A. J. Katan, A. Bonato, D. Marenduzzo, D. Michieletto et al., “Bridging-induced phase separation induced by cohesin smc protein complexes,” Science advances, vol. 7, no. 7, p. eabe5905, 2021.

      (16) A. Shakya, S. Park, N. Rana, and J. T. King, “Liquid-liquid phase separation of histone proteins in cells: role in chromatin organization,” Biophysical journal, vol. 118, no. 3, pp. 753–764, 2020.

      (17) A.-M. Florescu, P. Therizols, and A. Rosa, “Large scale chromosome folding is stable against local changes in chromatin structure,” PLoS computational biology, vol. 12, no. 6, p. e1004987, 2016.

    1. personenbezogenen Daten gespeichert.

      Hier hinter Absatz 3. die neuen Abschnitte 4. bis 6. einfügen (Nummerierung und Optik muss dann natürlich am Ende neu angepasst werden ... und achte bitte noch auf die Ergänzungen hier am Ende, die ihr ausfüllen müsst):

      4. Datenverarbeitung bei Nutzung von TIPAR (Plattform) 4.1 Nutzerkonto und Anmeldung

      Welche Daten: E-Mail-Adresse, Passwort (verschlüsselt/Hash), optional Login über Google (Google-Konto-ID), Zeitpunkt der Registrierung, technische Nutzungsdaten. Zweck: Bereitstellung des Nutzerkontos, Anmeldung, Missbrauchsprävention, technische Administration. Rechtsgrundlage: Art. 6 Abs. 1 lit. b DSGVO (Vertrag/ Nutzungsverhältnis), Art. 6 Abs. 1 lit. f DSGVO (Sicherheit, Missbrauchsprävention).

      4.2 Tierprofil und hinterlegte Informationen

      Welche Daten: Angaben zum Tier (z. B. Name, Merkmale, Besonderheiten, ggf. Medikamente, Routinen, Tierarztkontakt), TIPAR-Code/QR-Zuordnung, Dokumente/Notizen (falls genutzt). Zweck: Dokumentation und Abrufbarkeit der vom Nutzer hinterlegten Informationen im Rahmen der TIPAR-Leistung. Rechtsgrundlage: Art. 6 Abs. 1 lit. b DSGVO.

      Hinweis: Angaben zu einem Tier sind regelmäßig keine personenbezogenen Daten. Personenbezogen werden sie, sobald sie mit einem Nutzerkonto/Identifikator verknüpft werden.

      4.3 Benannte Ansprechpartner (z. B. Pate/Zweitpate)

      Welche Daten: Name, E-Mail, ggf. Telefonnummer, Rolle (Pate/Zweitpate), Status (eingeladen/bestätigt). Zweck: Kontaktaufnahme im Rahmen der TIPAR-Funktionen (Einladung, Bestätigung, Benachrichtigungen im Notfall, Kommunikation rund um die hinterlegten Informationen). Rechtsgrundlage: Art. 6 Abs. 1 lit. b DSGVO (Vertragsdurchführung gegenüber Nutzer), Art. 6 Abs. 1 lit. f DSGVO (berechtigtes Interesse an zuverlässiger Kontaktierung). Die Kontaktdaten benannter Ansprechpartner werden TIPAR durch den Nutzer übermittelt. Wir informieren benannte Ansprechpartner über die Verarbeitung ihrer Daten im Rahmen der Einladung/Benachrichtigung (Art. 14 DSGVO).

      4.4 Notfallzugriff / Abruf über QR-Code oder Code-Eingabe (wenn aktiv)

      Welche Daten: Abrufereignis (Zeitpunkt, IP-Adresse, Gerät/Browser-Infos), eingegebener Code, ggf. freigegebene Kontakt-/Tierinformationen, Benachrichtigung an Ansprechpartner. Zweck: Ermöglichen des Abrufs und Nachvollziehbarkeit von Zugriffen (Sicherheit, Missbrauchs- und Fehleranalyse), Benachrichtigung der vorgesehenen Kontaktpersonen. Rechtsgrundlage: Art. 6 Abs. 1 lit. b DSGVO, Art. 6 Abs. 1 lit. f DSGVO (Sicherheit/Protokollierung).

      4.5 Öffentlicher Registereintrag (optional, nur wenn der Nutzer aktiv einschaltet)

      Welche Daten: Die vom Nutzer ausdrücklich freigegebenen Angaben (Umfang je nach Auswahl). Zweck: Öffentliche Auffindbarkeit entsprechend der aktivierten Zusatzfunktion. Rechtsgrundlage: Art. 6 Abs. 1 lit. a DSGVO (Einwilligung durch aktive Aktivierung/ Freigabe).

      4.6 Support und Kontaktanfragen

      Welche Daten: Kontakt- und Kommunikationsdaten, Inhalte der Anfrage, ggf. Vertrags-/Bestelldaten zur Zuordnung. Zweck: Bearbeitung von Anfragen, Support, Nachweis/ Dokumentation von Vorgängen. Rechtsgrundlage: Art. 6 Abs. 1 lit. b DSGVO, Art. 6 Abs. 1 lit. f DSGVO (Support, Rechtsdurchsetzung/Abwehr).

      5. Zahlungsabwicklung und Versand 5.1 Zahlungsdienstleister Stripe

      Wenn du dich für eine zahlungspflichtige Leistung entscheidest, erfolgt die Zahlungsabwicklung über Stripe Payments Europe Ltd.. Dabei werden insbesondere Stamm- und Transaktionsdaten (z. B. Name, E-Mail-Adresse, Rechnungsadresse, Betrag, Zahlungsmethode, Transaktionskennung) an Stripe übermittelt, soweit dies zur Zahlungsabwicklung erforderlich ist. Weitere Informationen findest du in den Datenschutzhinweisen von Stripe.

      Rechtsgrundlage: Art. 6 Abs. 1 lit. b DSGVO (Vertrag/Zahlungsabwicklung).

      5.2 Herstellung und Versand physischer Produkte (Starter-Set/Goodies)

      Sofern dein Paket physische Produkte umfasst, verarbeiten wir Versand- und ggf. Personalisierungsdaten (z. B. Lieferadresse sowie Zuordnungsinformationen für Gravur/QR) zur Herstellung und Zustellung. Für den Versand nutzen wir den Versanddienstleister DHL. In diesem Zusammenhang übermitteln wir die für die Zustellung erforderlichen Daten (insbesondere Name und Lieferadresse sowie ggf. Sendungsinformationen) an DHL.

      Rechtsgrundlage: Art. 6 Abs. 1 lit. b DSGVO (Vertrag/Versand).

      Versanddienstleister: DHL (DHL Paket GmbH, Deutschland).

      6. Cookies, Einwilligungen und Consent-Management

      Wir setzen auf unserer Website Cookies und ähnliche Technologien ein. Technisch erforderliche Cookies sind notwendig, um die Website bereitzustellen und grundlegende Funktionen zu ermöglichen. Andere Cookies/Technologien (z. B. für Statistik, Marketing oder externe Dienste) setzen wir nur, wenn du zuvor eingewilligt hast.

      Rechtsgrundlagen:

      Technisch erforderliche Cookies: Art. 6 Abs. 1 lit. f DSGVO (berechtigtes Interesse an Betrieb/Sicherheit) bzw. Art. 6 Abs. 1 lit. b DSGVO (Bereitstellung der Website-Funktionen).

      Einwilligungspflichtige Cookies/Technologien: § 25 Abs. 1 TDDDG i. V. m. Art. 6 Abs. 1 lit. a DSGVO.

      Du kannst deine Einwilligungen jederzeit über [„Cookie-Einstellungen“/Consent-Banner] ändern oder widerrufen.

      (Platzhalter, den ihr bitte noch ausfüllen müsst: „Consent-Tool: [Name], Anbieter: [Name], Sitz: [Land]“ Außerdem: Liste der eingesetzten Kategorien/Tools (Statistik, Marketing, externe Medien, etc.))

    1. The BBC Global Experience Language (GEL) Technical Guides are a series of framework-agnostic, code-centric recommendations and examples for building GEL design patterns in websites. They illustrate how to create websites that comply with all BBC guidelines and industry best practice, giving special emphasis to the BBC Accessibility Guidelines for websites.

      This demonstrates Gel's documentation that focuses on accessibility, as that's part of its mission.

    1. La Métacognition : Stratégies pour des Apprentissages Réussis

      Résumé Exécutif

      Ce document de synthèse analyse les stratégies pédagogiques fondées sur la métacognition pour favoriser la réussite de tous les élèves.

      La métacognition est définie comme l'ensemble des processus par lesquels un individu régule ses propres activités cognitives, devenant ainsi le "pilote de sa cognition".

      Elle se décline en deux facettes principales : la métacognition explicite, qui est la connaissance consciente de ses propres processus d'apprentissage ("apprendre à apprendre"), et la métacognition implicite, qui repose sur les sentiments et la motivation intrinsèque.

      Face aux constats partagés de difficultés d'attention, d'oubli des savoirs et d'un manque de motivation chez les élèves, l'enseignement direct des stratégies métacognitives apparaît comme un levier puissant.

      Les approches concrètes incluent l'explication du fonctionnement du cerveau, la gestion de l'attention, la régulation de la mémorisation et le développement de la flexibilité cognitive pour résister aux automatismes.

      Un point central est la relation entre succès et motivation. Plutôt que de postuler que la motivation précède la réussite, les expériences de terrain suggèrent que c'est la réussite qui engendre la motivation et l'envie d'apprendre.

      En mettant les élèves en situation de succès, en leur proposant des tâches accessibles et en clarifiant les objectifs d'apprentissage, on crée un cercle vertueux d'engagement.

      Cette démarche ne constitue pas une révolution, mais une évolution des pratiques professionnelles vers un enseignement plus ciblé ("moins mais mieux") et un outil efficace pour lutter contre les inégalités scolaires.

      --------------------------------------------------------------------------------

      1. Fondements de la Métacognition

      La métacognition est présentée comme une méthode pédagogique efficace, s'appuyant sur la recherche, pour prévenir les difficultés scolaires et favoriser la réussite de tous les élèves.

      1.1. Définition et Capacités Clés

      La métacognition englobe l'ensemble des processus par lesquels un individu régule son apprentissage.

      Selon Frédéric Guy, chargé de mission au Cézanne, cela inclut les capacités à :

      • Réguler son attention

      • Choisir de s'informer

      • Planifier et résoudre un problème

      • Repérer et corriger ses propres erreurs

      Ces processus permettent de prédire la faisabilité d'une tâche et d'évaluer ses propres performances. Ils reposent sur quatre capacités fondamentales :

      1. Fixer des buts et identifier les actions nécessaires pour les atteindre.

      2. Détecter et identifier les erreurs pour y remédier.

      3. Évaluer ses résultats et ses conclusions.

      4. Réviser les stratégies utilisées.

      1.2. Les Deux Facettes de la Métacognition

      Il est essentiel de distinguer deux aspects complémentaires de la métacognition :

      | Type de Métacognition | Description | Caractéristiques | | --- | --- | --- | | Explicite (ou Déclarative) | L'approche classique de la "cognition sur la cognition". C'est la capacité de l'élève à verbaliser ses stratégies et ses connaissances sur l'apprentissage. | • Consciente et conceptuelle.<br>• Repose sur des méta-représentations (ex: "pour apprendre, je dois faire cela").<br>• Concerne les perceptions sur les tâches ("c'est difficile") ou sur soi ("je suis bon en maths"). | | Implicite | Une régulation qui se fait sur la base de sentiments dédiés à l'apprentissage.

      Elle est liée à la motivation et à l'évaluation intuitive de l'effort à fournir. | • Basée sur des sentiments et des intuitions.<br>• Moins consciente, plus automatique.<br>• Influence directement la motivation et l'engagement. |

      2. Pistes Pédagogiques pour la Métacognition Explicite

      L'objectif est de donner aux élèves les outils pour devenir autonomes dans leur apprentissage.

      La citation clé de Marie Bridenne, Conseillère Pédagogique, résume cette ambition :

      « Développer ses compétences métacognitives, c’est devenir pilote de sa cognition. »

      2.1. Comprendre le Fonctionnement du Cerveau

      Pour que les élèves puissent réguler leur cognition, il faut d'abord qu'ils en comprennent les mécanismes de base.

      Action : Parler du cerveau en classe, à tous les niveaux, et questionner les élèves sur leurs représentations ("A-t-on tous le même cerveau ?", "Comment fonctionne-t-il ?").

      Outils : Utilisation de ressources pédagogiques comme les ouvrages Découvrir le cerveau à l'école (Canopé), _Kididoc :

      Explore ton cerveau_, ou C'est (pas) moi, c'est mon cerveau !.

      2.2. Gérer et Adapter son Attention

      L'attention est une ressource limitée qui doit être maîtrisée.

      Action : Mettre en place des programmes attentionnels pour faire découvrir aux élèves ce qu'est l'attention, ses limites, et comment la maîtriser de façon autonome (équilibre attentionnel, retour au calme).

      Outils : Programmes structurés comme ATOLE (Apprendre l'ATtention à l'écOLE) pour les cycles 2 et 3, et ADOLE pour le collège et le lycée.

      2.3. Réguler les Processus de Mémorisation

      La mémorisation efficace repose sur trois piliers : comprendre, se questionner, répéter.

      Action : Mettre en place des routines et des outils pour structurer la mémorisation et la révision.

      Outils :

      Fiches mémo pour synthétiser les savoirs.  

      Cartes quiz rédigées par les élèves pour s'auto-interroger.  

      Boîtes de Leitner pour organiser la répétition espacée des notions.  

      Calendrier de reprises expansées pour planifier les révisions.

      2.4. Résister aux Automatismes et Être Flexible

      Apprendre, c'est acquérir des automatismes, mais c'est aussi savoir y résister pour progresser.

      Action : Entraîner les élèves à inhiber leurs réflexes pour développer de nouvelles stratégies, un regard critique et une plus grande tolérance à l'erreur.

      Exemples :

      ◦ Comprendre que la lettre "O" ne produit pas systématiquement le son [o].    ◦ Changer de procédure en calcul mental (ex: pour ajouter 9, ajouter 10 puis retirer 1).

      3. Motivation et Métacognition Implicite : Le Cercle Vertueux de la Réussite

      La motivation est indispensable à l'engagement dans les tâches. Les sources soulèvent une question fondamentale :

      « Faut-il être motivé pour vouloir apprendre et réussir ? Ou faut-il réussir pour vouloir apprendre et se motiver ? » La réponse apportée par l'expérience de terrain est que la réussite est le principal moteur de la motivation.

      3.1. Les Levier pour Vouloir Apprendre

      Pour susciter l'envie, il est crucial de créer les conditions de la réussite et du plaisir d'apprendre.

      Mettre les élèves en réussite : Les buts de performance peuvent avoir des effets délétères en cas d'échec. Il faut donc concevoir des tâches que les élèves considèrent comme accessibles.

      Développer des projets motivants : Lier les apprentissages à des projets concrets et stimulants (rallyes mathématiques, balades lexicales, projet CNR "J'y arrive !").

      S'appuyer sur les 4 piliers de la motivation :

      Intérêt : Le plaisir pris à réaliser la tâche.  

      Importance : La valeur accordée à la tâche.  

      Effort : La perception du coût en énergie.   

      Succès : Le sentiment de compétence et la réussite effective.

      3.2. Les Levier pour Pouvoir Apprendre

      Donner aux élèves la capacité d'apprendre passe par la clarification du cadre et des objectifs.

      Clarifier les objectifs d'apprentissage : Différencier l'objectif réel de la consigne.

      L'élève doit comprendre ce qu'il est en train d'apprendre (ex : non pas "colorier une carte", mais "apprendre à réaliser une carte en respectant un code de couleurs").

      Structurer le temps et les activités : Utiliser un "Menu du jour" pour rendre les objectifs de la journée visibles et explicites.

      Verbaliser les apprentissages : Instaurer un "Journal des apprentissages" où l'élève note ce qu'il a compris ("J'ai compris que...").

      Cela aide à la prise de conscience et à l'appropriation des savoirs.

      4. Mise en Œuvre Stratégique

      L'intégration de la métacognition dans les pratiques pédagogiques doit être pensée de manière systémique et progressive.

      4.1. Exemple d'une Dynamique de Circonscription (2022-2025)

      | Année | Actions Clés | Objectifs | | --- | --- | --- | | 2022-2023 | • Conférences "Talents du cerveau".<br>• Séminaire sur les neuromythes et la flexibilité. | Développement d’une culture commune autour de la métacognition. | | 2023-2024 | • Diffusion auprès des équipes (conseils de maîtres).<br>• Ateliers pratiques (F. Guilleray).<br>• Séminaire sur les pratiques évaluatives. | Acculturation des enseignants et déploiement des outils. | | 2024-2025 | • Conseil-École-Collège sur les compétences attentionnelles et mémorielles.<br>• Projet CNR "J'y arrive" (accompagné par JF Chesné).<br>• Accompagnement des enseignants débutants. | Ancrage des pratiques et suivi des effets sur les élèves. |

      4.2. Une Évolution des Pratiques Professionnelles

      L'approche métacognitive n'est « pas une révolution mais une évolution des gestes professionnels ».

      Elle invite à une rationalisation des pratiques sous le principe « MOINS MAIS MIEUX », en se concentrant sur les stratégies qui ont le plus d'impact.

      Conclusion

      Enseigner les connaissances et les stratégies métacognitives est un levier puissant pour lutter contre les inégalités éducatives et favoriser la réussite scolaire de TOUS les élèves. En leur donnant les clés pour comprendre et réguler leur propre fonctionnement cognitif, l'école leur permet de passer d'un statut d'apprenant passif à celui d'acteur autonome et conscient de ses apprentissages. Cette démarche outille les élèves pour qu'ils puissent, tout au long de leur vie, apprendre de manière plus efficace et plus sereine.

    1. for user in users_who_liked_our_post: display("Yay! " + user + " liked our post!") Copy to clipboard 'Yay! @pretend_user_1 liked our post!' Copy to clipboard 'Yay! @pretend_user_2 liked our post!' Copy to clipboard 'Yay! @pretend_user_3 liked our post!'

      This code looks like it would be used to notify people when others have liked their posts. Maybe not exactly how that's achieved but the results kind of reminded me of those notifications, and when you're someone who has a lot of active followers and get a lot of likes, I would imagine this code would make it easier to make notifications at a larger scale.

    1. The user interface of a computer system (like a social media site), is the part that you view and interact with. It’s what you see on your screen and what you press or type or scroll over. Designers of social media sites have to decide how to layout information for users to navigate and decide how the user performs various actions (like, retweet, post, look up user, etc.). Some information and actions will be made larger and easier to access while others will be smaller or hidden in menus or settings. As we look at these interfaces, there are two key terms we want you to know: Affordances are what a user interface lets you do. In particular, it’s what a user interface makes feel natural to do. So for example, an interface might have something that looks like it should be pressed, or an interface might open by scrolling a little so it is clear that if you touch it you can make it scroll more (see a more nuanced explanation here) Friction is anything that gets in the way of a user performing an action. For example, if you have to open and navigate through several menus to find the privacy settings, that is significant friction. Or if one of the buttons has a bug and doesn’t work when you press it, so you have to find another way of performing that action, which is significant friction. Designers sometimes talk about trying to make their user interfaces frictionless, meaning the user can use the site without feeling anything slowing them down. Sometimes designers add friction to sites intentionally. For example, ads in mobile games make the “x” you need to press incredibly small and hard to press to make it harder to leave their ad:

      User interface is one of the most vital part of a computer system, and designers must enter the mind of the user to try and curry their favor, spend more time on their site or app, and make being on the site more friendly to users. I find it fascinating that programmers are able code these little interaction in an app that feel so natural and common sense like, but behind the scene there's a lot of work and thought put into the whole process.

    1. Japanese image-sharing bulletin board called Futaba or 2chan [e19].

      I wonder how this company might seek legal recourse for this action. Does Japanese law have a provision for stealing code? does the U.S?

    1. The charges they face are staggering. These men have been indicted for High Treason and Espionage. Under the Belarusian Criminal Code, these charges carry sentences of life imprisonment or even the death penalty

      They are charged with treason/espionage, which carries life an death sentences.

    1. Allgemeine Geschäftsbedingungender Solid Deal GmbH für die Nutzung der Plattform TIPAR1. GeltungsbereichDiese Allgemeinen Geschäftsbedingungen gelten für alle Verträge zwischen der Solid Deal GmbH (nachfolgend „Anbieter“) und Kunden, die Leistungen über die Plattform TIPAR (www.tipar.de) in Anspruch nehmen. Abweichende Bedingungen des Kunden werden nicht anerkannt, es sei denn, der Anbieter stimmt ihrer Geltung ausdrücklich schriftlich zu.2. VertragsgegenstandTIPAR ist eine digitale Vorsorgeplattform für Tierhalter. Der Anbieter stellt die technische Infrastruktur zur Erfassung, Erstellung und Dokumentation von Tierpatenschaftsvereinbarungen bereit. Dazu gehören optionale Zusatzleistungen wie Notfallkarten, QR-Code-Zugänge und Informationspakete.3. Registrierung und NutzerkontoZur Nutzung der Services ist ein persönliches Nutzerkonto erforderlich. Der Kunde verpflichtet sich, bei der Registrierung wahrheitsgemäße Angaben zu machen und Zugangsdaten vertraulich zu behandeln. Änderungen der Kontaktdaten sind unverzüglich mitzuteilen.Pro Person darf nur ein persönliches Konto geführt werden.Der Kunde ist für die Richtigkeit seiner Angaben verantwortlich.Bei Verdacht auf Missbrauch des Kontos ist der Anbieter unverzüglich zu informieren.4. VertragsschlussDer Vertrag kommt zustande, sobald der Kunde den Bestellprozess auf der Plattform abschließt und die Zahlung erfolgreich bestätigt wurde. Der Anbieter übermittelt dem Kunden unverzüglich eine Bestätigung per E-Mail.5. Preise und ZahlungAlle angegebenen Preise verstehen sich in Euro inklusive der gesetzlichen Umsatzsteuer. Die Zahlungsabwicklung erfolgt über den Zahlungsdienstleister Stripe Payments Europe Ltd.Zahlungsmethoden: Kreditkarte, SEPA-Lastschrift, Apple Pay, Google Pay.Der Betrag wird unmittelbar nach Vertragsabschluss fällig.Rechnungen werden elektronisch bereitgestellt.6. WiderrufsrechtDie über TIPAR erstellten Tierpatenschaftsvereinbarungen werden individuell nach den Angaben des Kunden angefertigt. Gemäß § 312g Abs. 2 Nr. 1 BGB besteht daher kein Widerrufsrecht. Mit Abschluss des Bestellvorgangs bestätigt der Kunde, dass er von diesem Ausschluss des Widerrufsrechts Kenntnis genommen hat und diesem zustimmt.Für digitale Zusatzprodukte ohne Individualisierung gilt das gesetzliche Widerrufsrecht. Nähere Informationen finden sich in der Widerrufsbelehrung.Korrekturen sind vor Beginn der individuellen Erstellung möglich.Änderungswünsche bitte unverzüglich an support@tipar.de melden.7. Pflichten der NutzerDer Kunde stellt sicher, dass die in TIPAR hinterlegten Daten zu Tier und Paten korrekt und aktuell sind. Änderungen sind zeitnah zu aktualisieren. Der Kunde ist dafür verantwortlich, dass benannte Paten zur Übernahme bereit und informiert sind.8. HaftungDer Anbieter haftet bei Vorsatz und grober Fahrlässigkeit unbeschränkt. Bei leichter Fahrlässigkeit haftet der Anbieter nur bei Verletzung wesentlicher Vertragspflichten (Kardinalpflichten) und begrenzt auf den vorhersehbaren, vertragstypischen Schaden. Eine Haftung für Schäden, die auf fehlerhafte oder unvollständige Angaben des Kunden zurückzuführen sind, ist ausgeschlossen.9. Vertragslaufzeit und KündigungDie Vertragslaufzeit richtet sich nach dem gewählten Tarif. Digitale Zugänge bleiben aktiv, solange ein gültiger Vertrag besteht. Eine ordentliche Kündigung vor Ablauf der vereinbarten Laufzeit ist ausgeschlossen, sofern nichts anderes vereinbart wurde.10. SchlussbestimmungenEs gilt das Recht der Bundesrepublik Deutschland unter Ausschluss des UN-Kaufrechts. Erfüllungsort ist der Sitz des Anbieters. Sollten einzelne Bestimmungen dieser AGB unwirksam sein, bleibt die Wirksamkeit der übrigen Bestimmungen unberührt.

      TIPAR AGB

      **Allgemeine Geschäftsbedingungen

      der Solid Deal GmbH für die Nutzung der Plattform TIPAR**

      1. Geltungsbereich

      Diese Allgemeinen Geschäftsbedingungen gelten für alle Verträge zwischen der Solid Deal GmbH, Horneburger Str. 44, 45711 Datteln (nachfolgend „Anbieter“) und Verbrauchern oder Unternehmern (nachfolgend „Nutzer“), die Leistungen über die Plattform TIPAR unter www.tipar.de in Anspruch nehmen.

      Abweichende Bedingungen des Nutzers finden keine Anwendung, es sei denn, der Anbieter stimmt ihrer Geltung ausdrücklich in Textform zu.

      2. Vertragsgegenstand

      TIPAR ist eine digitale Vorsorgeplattform für Tierhalter. Der Anbieter stellt eine technische Infrastruktur zur Verfügung, mit der Nutzer Informationen zu Tieren, benannten Ansprechpartnern (z. B. Paten) sowie ergänzende Angaben erfassen, verwalten und dokumentieren können, um im Ernstfall Orientierung zu schaffen.

      Zum Leistungsumfang können – je nach gewähltem Paket – digitale Zugänge sowie optionale physische Produkte (z. B. Notfallkarten oder QR-Kennzeichnungen) gehören.

      TIPAR ersetzt keine tierärztliche, rechtliche oder behördliche Entscheidung und begründet keine Eigentumsübertragung an Tieren.

      2a. Rolle von TIPAR / Vermittlungsleistung

      TIPAR stellt ausschließlich eine digitale Plattform zur Dokumentation, Verwaltung und Auffindbarkeit von Informationen zur Verfügung.

      Die Vereinbarung über die tatsächliche Betreuung, Übernahme oder Versorgung eines Tieres kommt ausschließlich zwischen dem Tierhalter und der von ihm benannten Person zustande. TIPAR wird nicht Vertragspartner dieser Vereinbarung und übernimmt keine rechtliche, tatsächliche oder wirtschaftliche Verpflichtung zur Betreuung, Übernahme oder Versorgung eines Tieres.

      TIPAR übernimmt insbesondere keine Garantie oder Haftung dafür, dass benannte Personen die Betreuung oder Übernahme eines Tieres tatsächlich durchführen, durchführen können oder erreichbar sind.

      Die Leistung von TIPAR beschränkt sich auf die Bereitstellung der technischen Infrastruktur, die Dokumentation der vom Nutzer bereitgestellten Angaben sowie deren digitale Auffindbarkeit im Ernstfall.

      Der Nutzer ist selbst dafür verantwortlich, Hinweise, Kennzeichnungen, Notfallkarten oder sonstige physische oder digitale Verweise auf TIPAR so zu platzieren, mitzuführen oder anzubringen, dass sie im Ernstfall von Dritten gefunden und wahrgenommen werden können.

      TIPAR schuldet die vertragsgemäße Bereitstellung der Plattform sowie die technische Abrufbarkeit der vom Nutzer hinterlegten Informationen im Rahmen des vereinbarten Leistungsumfangs. Eine Garantie oder Erfolgsschuld besteht jedoch nicht, insbesondere nicht dafür, dass Dritte (z.B. Behörden, Einsatzkräfte, Finder) die Hinweise tatsächlich finden, den Abruf durchführen oder die Informationen nutzen, sowie nicht dafür, dass benannte Ansprechpartner erreichbar sind oder die Versorgung tatsächlich übernehmen. Die Verantwortung dafür, dass Hinweise, Kennzeichnungen oder Verweise auf TIPAR im Einzelfall so platziert oder mitgeführt werden, dass sie von Dritten wahrgenommen werden können, liegt beim Nutzer.

      3. Registrierung und Nutzerkonto

      Die Nutzung ist nur volljährigen Personen gestattet; für Minderjährige handeln die gesetzlichen Vertreter.

      Die Nutzung der Plattform erfordert die Erstellung eines persönlichen Nutzerkontos.

      Der Nutzer verpflichtet sich, bei der Registrierung vollständige und wahrheitsgemäße Angaben zu machen und diese aktuell zu halten. Zugangsdaten sind vertraulich zu behandeln und dürfen nicht an Dritte weitergegeben werden.

      Pro Person darf nur ein Nutzerkonto geführt werden. Der Nutzer ist für alle Aktivitäten verantwortlich, die über sein Konto erfolgen. Bei Verdacht auf Missbrauch ist der Anbieter unverzüglich zu informieren.

      4. Vertragsschluss

      Der Vertrag kommt zustande, sobald der Nutzer den Bestellprozess auf der Plattform abschließt und – sofern kostenpflichtige Leistungen gewählt wurden – die Zahlung erfolgreich durchgeführt wurde. Bei Verbrauchern erfolgt der Vertragsschluss im elektronischen Geschäftsverkehr über eine eindeutig als zahlungspflichtig gekennzeichnete Bestätigungsschaltfläche.

      Der Anbieter bestätigt den Vertragsschluss per E-Mail.

      5. Preise und Zahlung

      Alle Preise verstehen sich in Euro inklusive der gesetzlichen Umsatzsteuer, sofern nicht anders angegeben.

      Die Zahlungsabwicklung erfolgt über den Zahlungsdienstleister Stripe Payments Europe Ltd. Akzeptierte Zahlungsmethoden sind insbesondere Kreditkarte, SEPA-Lastschrift, Apple Pay und Google Pay.

      Einmalige Entgelte (z. B. Setup-Fee) werden unmittelbar nach Vertragsschluss fällig. Rechnungen werden dem Nutzer elektronisch zur Verfügung gestellt.

      Soweit eine Verlängerung vereinbart ist, erteilt der Nutzer mit Vertragsschluss die Autorisierung zur wiederkehrenden Abrechnung der jeweiligen Vertragsperiode über die gewählte Zahlungsmethode.

      § 5a Lieferung und Herstellung physischer Produkte (Goodies)

      1. Herstellung / Beginn der Fertigung
Sofern der Leistungsumfang physische Produkte (z. B. Notfallkarten, QR-Kennzeichnungen, Plaketten) umfasst, beginnt die Herstellung grundsätzlich nach Abschluss des Bestellprozesses und erfolgreicher Zahlung, sofern keine abweichende Regelung im Bestellprozess angegeben ist.
      2. Liefergebiet und Versand
Die Lieferung erfolgt an die vom Nutzer im Bestellprozess angegebene Lieferadresse. Ein Anspruch auf Lieferung in bestimmte Länder besteht nur, soweit diese im Bestellprozess als Liefergebiet angeboten werden.
      3. Lieferzeit
Angaben zu Lieferzeiten sind, sofern nicht ausdrücklich als verbindlich bezeichnet, unverbindliche Richtwerte. Teillieferungen sind zulässig, soweit sie dem Nutzer zumutbar sind.
      4. Mitwirkungspflicht: korrekte Lieferadresse
Der Nutzer ist verpflichtet, die Lieferadresse vollständig und korrekt anzugeben und Änderungen unverzüglich mitzuteilen, soweit dies technisch möglich ist. Mehrkosten, die durch eine vom Nutzer zu vertretende fehlerhafte oder unvollständige Adressangabe entstehen (z. B. Rücksendung, erneuter Versand), trägt der Nutzer.
      5. Gefahrübergang
Gegenüber Verbrauchern geht die Gefahr des zufälligen Untergangs oder der zufälligen Verschlechterung der Ware erst mit Übergabe der Ware an den Verbraucher über. Gegenüber Unternehmern geht die Gefahr mit Übergabe der Ware an das Versandunternehmen über.
      6. Sachmängel / Austausch bei fehlerhaften Produkten
Für physische Produkte gelten die gesetzlichen Gewährleistungsrechte. Der Nutzer wird gebeten, offensichtliche Transportschäden möglichst zeitnah dem Versanddienstleister und dem Anbieter mitzuteilen; die gesetzlichen Rechte des Nutzers bleiben hiervon unberührt.
Bei berechtigten Mängelrügen leistet der Anbieter nach seiner Wahl Nacherfüllung durch Ersatzlieferung oder Nachbesserung, soweit dies möglich und zumutbar ist.

      § 5b Spendenanteil / Unterstützung Tierschutz

      1. Soweit im Bestellprozess ausgewiesen, wird aus der Setup-Fee ein fester Betrag zur Unterstützung von Tierschutzorganisationen verwendet (z. B. 5,00 EUR).
      2. Der Unterstützungsbetrag ist Bestandteil der Gesamtpreisgestaltung. Ein Anspruch des Nutzers auf Auswahl einer bestimmten Organisation besteht nur, sofern dies im Bestellprozess ausdrücklich angeboten wird.
      3. Bei Kündigung oder sonstiger Vertragsbeendigung erfolgt keine Rückerstattung des Unterstützungsbetrags.

      6. Widerrufsrecht

      Sofern der Vertrag die Lieferung von Waren umfasst, die individuell nach Kundenspezifikation angefertigt werden (z. B. personalisierte Notfallkarten oder Kennzeichnungen), besteht gemäß § 312g Abs. 2 Nr. 1 BGB kein Widerrufsrecht.

      Für nicht individualisierte digitale Leistungen gilt das gesetzliche Widerrufsrecht, sofern gesetzlich vorgesehen. Einzelheiten ergeben sich aus der gesonderten Widerrufsbelehrung.

      Korrekturen von Angaben sind bis zum Beginn der individuellen Fertigung möglich und unverzüglich mitzuteilen.

      7. Pflichten der Nutzer

      Der Nutzer ist dafür verantwortlich, dass alle in TIPAR hinterlegten Angaben zu Tier, Ansprechpartnern und sonstigen Informationen korrekt, vollständig und aktuell sind.

      Der Nutzer darf die Plattform ausschließlich für eigene, berechtigte Zwecke nutzen. Insbesondere ist es untersagt, falsche oder irreführende Angaben zu machen, Tiere zu registrieren, für die keine Berechtigung besteht, oder Daten ohne Wissen und Einverständnis der betroffenen Personen zu hinterlegen. Der Anbieter behält sich vor, bei missbräuchlicher oder rechtswidriger Nutzung Inhalte zu sperren oder Nutzerkonten zu deaktivieren.

      Der Nutzer stellt sicher, dass benannte Ansprechpartner über ihre Rolle informiert sind und zur Übernahme der benannten Verantwortung grundsätzlich bereit und fähig sind.

      Der Nutzer versichert zudem, dass er berechtigt ist, personenbezogene Daten der benannten Ansprechpartner (z. B. Name, E-Mail-Adresse, Telefonnummer) in TIPAR zu hinterlegen, und dass die benannten Ansprechpartner mit der Speicherung und Nutzung dieser Daten zum Zweck der Kontaktaufnahme im Rahmen von TIPAR einverstanden sind.

      Der Nutzer verpflichtet sich, benannte Ansprechpartner auf Wunsch von TIPAR oder des Ansprechpartners unverzüglich zu aktualisieren oder zu entfernen, sofern hierfür ein berechtigter Grund besteht.

      Der Anbieter übernimmt keine Prüfung der tatsächlichen Verfügbarkeit, Eignung oder Erreichbarkeit benannter Personen.

      TIPAR hat keinen Einfluss darauf, ob Behörden, Einsatzkräfte oder sonstige Dritte die bereitgestellten Informationen tatsächlich abrufen oder nutzen.

      § 7a Sperrung und Kündigung durch den Anbieter

      1. Sperrung bei Verdacht / Schutz der Plattform
Der Anbieter ist berechtigt, den Zugang zur Plattform vorübergehend zu sperren, wenn konkrete Anhaltspunkte für einen Missbrauch, einen Verstoß gegen diese AGB oder eine rechtswidrige Nutzung vorliegen und die Sperrung zur Abwehr von Schäden oder zur Sicherung der Plattform erforderlich ist.
      2. Kündigung aus wichtigem Grund
Der Anbieter ist berechtigt, den Vertrag aus wichtigem Grund außerordentlich zu kündigen, insbesondere wenn der Nutzer
a) vorsätzlich falsche oder irreführende Angaben hinterlegt,
b) Tiere registriert, für die keine Berechtigung besteht,
c) personenbezogene Daten ohne erforderliche Berechtigung oder Einwilligung hinterlegt,
d) die Plattform zur Täuschung, zum Spam, zu missbräuchlichen Anfragen oder sonstigen rechtswidrigen Zwecken nutzt oder
e) Sicherheitsmechanismen oder technische Schutzmaßnahmen umgeht oder dies versucht.
      3. Vorherige Fristsetzung / Abmahnung
Soweit dem Anbieter zumutbar, wird der Nutzer vor einer außerordentlichen Kündigung abgemahnt und erhält eine angemessene Frist zur Abhilfe. Dies gilt nicht, wenn eine Abhilfe nicht möglich ist oder die sofortige Kündigung aufgrund der Schwere des Verstoßes gerechtfertigt ist.
      4. Folgen der Sperrung / Kündigung
Im Falle der Sperrung oder Kündigung kann der Anbieter den Zugang zu Inhalten und Funktionen der Plattform einschränken. Gesetzliche Aufbewahrungspflichten und berechtigte Interessen des Anbieters bleiben unberührt.
      5. Erstattungen
Im Falle einer außerordentlichen Kündigung durch den Anbieter aus wichtigem Grund, den der Nutzer zu vertreten hat, besteht kein Anspruch auf Erstattung bereits gezahlter Entgelte. Gesetzliche Ansprüche des Nutzers bleiben unberührt.

      § 7b Nutzerinhalte, Rechte und Freistellung

      1. Nutzerinhalte
Soweit TIPAR das Hochladen oder Hinterlegen von Inhalten ermöglicht (z. B. Fotos, Texte, Dokumente oder sonstige Dateien; nachfolgend „Nutzerinhalte“), ist der Nutzer für diese Inhalte allein verantwortlich.
      2. Rechte an Nutzerinhalten
Der Nutzer versichert, dass er über alle erforderlichen Rechte an den Nutzerinhalten verfügt und durch die Nutzung keine Rechte Dritter (insbesondere Urheber-, Marken-, Persönlichkeits- oder Datenschutzrechte) verletzt werden.
      3. Einräumung von Nutzungsrechten an den Anbieter
Der Nutzer räumt dem Anbieter an den Nutzerinhalten ein einfaches, nicht ausschließliches, räumlich unbeschränktes und für die Dauer des Vertragsverhältnisses gültiges Recht ein, die Nutzerinhalte zum Zweck der Bereitstellung der Plattform zu speichern, zu vervielfältigen, technisch zu verarbeiten, im Nutzerkonto anzuzeigen sowie im Rahmen der vom Nutzer vorgesehenen Abruf- und Freigabefunktionen zugänglich zu machen. Eine darüberhinausgehende Nutzung zu Werbe- oder Marketingzwecken erfolgt nur mit gesonderter Zustimmung des Nutzers.
      4. Entfernung von Nutzerinhalten
Der Nutzer kann Nutzerinhalte im Rahmen der technischen Möglichkeiten im Nutzerkonto entfernen oder anpassen. Gesetzliche Aufbewahrungspflichten und berechtigte Interessen des Anbieters bleiben unberührt.
      5. Freistellung
Der Nutzer stellt den Anbieter von sämtlichen Ansprüchen Dritter frei, die aufgrund der Nutzerinhalte oder einer sonstigen rechtswidrigen Nutzung der Plattform gegen den Anbieter geltend gemacht werden, sofern der Anbieter die Rechtsverletzung nicht zu vertreten hat. Dies umfasst auch angemessene Kosten der Rechtsverteidigung.

      8. Haftung

      Der Anbieter haftet unbeschränkt bei Vorsatz und grober Fahrlässigkeit.

      Bei leichter Fahrlässigkeit haftet der Anbieter nur bei Verletzung wesentlicher Vertragspflichten (Kardinalpflichten) und beschränkt auf den vorhersehbaren, vertragstypischen Schaden.

      Eine Haftung für Schäden, die auf unrichtige, unvollständige oder nicht aktualisierte Angaben des Nutzers zurückzuführen sind, ist ausgeschlossen.

      Ein Anspruch auf eine jederzeitige, ununterbrochene Verfügbarkeit der Plattform besteht nicht. Wartungsarbeiten, Sicherheitsupdates oder technische Störungen können zu vorübergehenden Einschränkungen führen.

      § 8a Höhere Gewalt / Drittleistungen

      Der Anbieter haftet nicht für Leistungsstörungen, die auf höhere Gewalt oder auf Störungen bei Drittanbietern beruhen, die der Anbieter nicht zu vertreten hat (z. B. Zahlungsdienstleister, Versanddienstleister, Hosting), sofern der Anbieter zumutbare Maßnahmen zur Behebung ergreift.

      9. Änderungen am System / Weiterentwicklung

      Der Anbieter behält sich vor, Funktionen der Plattform weiterzuentwickeln, anzupassen oder zu verändern, sofern der Vertragszweck hierdurch nicht wesentlich beeinträchtigt wird. Für Verbraucher gelten bei Änderungen der digitalen Leistungen ergänzend die Regelungen in Ziffer 9a.

      9a. Digitale Leistungen, Aktualisierungen und Änderungen (Verbraucher)

      1. Vertragsgemäße Bereitstellung
Der Anbieter stellt dem Nutzer die digitalen Leistungen von TIPAR im Rahmen der vereinbarten Funktionen über die Plattform bereit.
      2. Aktualisierungen
Soweit Aktualisierungen (insbesondere Sicherheits- und Funktionsupdates) erforderlich sind, um die Vertragsgemäßheit der digitalen Leistungen zu erhalten, wird der Anbieter diese innerhalb eines angemessenen Zeitraums bereitstellen.
      3. Mitwirkungspflichten des Nutzers
Der Nutzer ist verpflichtet, bereitgestellte Aktualisierungen zu installieren bzw. die erforderlichen Mitwirkungshandlungen vorzunehmen, sofern ihm dies zumutbar ist und er über die Folgen einer unterlassenen Aktualisierung informiert wurde.
      4. Rechte bei Leistungsstörungen / Mängeln
Soweit die digitalen Leistungen nicht vertragsgemäß bereitgestellt werden, hat der Nutzer die gesetzlichen Rechte. Der Anbieter erhält zunächst die Möglichkeit, den vertragsgemäßen Zustand innerhalb angemessener Frist herzustellen.
      5. Änderungen an digitalen Leistungen
Der Anbieter kann digitale Leistungen ändern, wenn hierfür ein triftiger Grund besteht (z. B. technische Weiterentwicklung, Sicherheitsanforderungen, Rechtsänderungen) und die Änderung für den Nutzer zumutbar ist.
Der Anbieter wird den Nutzer über Änderungen rechtzeitig in geeigneter Form informieren.
      6. Sonderkündigungsrecht bei nicht nur unerheblicher Beeinträchtigung
Führt eine Änderung zu einer nicht nur unerheblichen Beeinträchtigung der Nutzungsmöglichkeit der digitalen Leistungen, kann der Nutzer den Vertrag innerhalb von 30 Tagen ab Zugang der Änderungsmitteilung bzw. ab Durchführung der Änderung kündigen.

      10. Vertragslaufzeit und Kündigung

      Die Vertragslaufzeit richtet sich nach dem jeweils gewählten Tarif.

      Sofern ein kostenfreies erstes Nutzungsjahr vorgesehen ist, beginnt eine kostenpflichtige Verlängerung erst nach Ablauf dieses Zeitraums. Als Aktivierung gilt der Zeitpunkt, zu dem (i) der Bestellprozess abgeschlossen und (ii) die fällige Zahlung erfolgreich verarbeitet wurde und der gewählte Tarif im Nutzerkonto freigeschaltet ist. Das kostenfreie erste Nutzungsjahr beginnt mit dem Zeitpunkt der Aktivierung und endet nach Ablauf von zwölf (12) Monaten. Ab dem Folgetag beginnt die kostenpflichtige Vertragsperiode gemäß dem jeweils gewählten Tarif. Der Anbieter informiert den Nutzer rechtzeitig vor Beginn der ersten kostenpflichtigen Vertragsperiode in Textform über den anstehenden Übergang in die kostenpflichtige Verlängerung sowie über Preis, Laufzeit und Kündigungsfrist. Nach Ablauf der jeweiligen Vertragslaufzeit verlängert sich der Vertrag um die vereinbarte Laufzeit, sofern er nicht fristgerecht gekündigt wird. Dies gilt auch nach Ablauf eines kostenfreien ersten Nutzungsjahres, sofern im Tarif eine anschließende kostenpflichtige Verlängerung vorgesehen ist.

      Eine Kündigung ist mit einer Frist von 30 Tagen zum Ende der jeweiligen Vertragslaufzeit möglich, sofern im Tarif nichts Abweichendes geregelt ist.

      Etwaige gesetzliche Sonderkündigungsrechte, insbesondere nach Ziffer 9a, bleiben unberührt.

      § 10a Online-Kündigung (Kündigungsfunktion)

      1. Verbraucher können Verträge über die Plattform TIPAR auch online kündigen. Hierfür stellt der Anbieter eine unmittelbar erreichbare Kündigungsfunktion (z. B. „Verträge hier kündigen“) bereit.
      2. Die Kündigung kann ohne zusätzliche Hürden abgegeben werden; der Anbieter bestätigt den Eingang der Kündigung in Textform.
      3. Weitere Kündigungswege (z. B. per E-Mail in Textform) bleiben unberührt.

      § 10b Datenzugriff, Export und Löschung nach Vertragsende

      1. Nach Vertragsende kann der Zugriff auf Funktionen und Inhalte entsprechend dem gewählten Tarif eingeschränkt werden.
      2. Der Nutzer hat die Möglichkeit, die von ihm hinterlegten Daten im Rahmen der technischen Möglichkeiten vor Vertragsende zu exportieren bzw. herunterzuladen.
      3. Nach Vertragsende werden Daten im Rahmen gesetzlicher Aufbewahrungspflichten gespeichert und im Übrigen nach Ablauf angemessener Fristen gelöscht oder anonymisiert; Details ergeben sich aus der Datenschutzerklärung.

      11. Schlussbestimmungen

      Es gilt das Recht der Bundesrepublik Deutschland unter Ausschluss des UN-Kaufrechts. Sollten einzelne Bestimmungen dieser AGB unwirksam sein oder werden, bleibt die Wirksamkeit der übrigen Bestimmungen unberührt. Verbraucherstreitbeilegung (§ 36 VSBG): Die Solid Deal GmbH ist weder verpflichtet noch bereit, an einem Streitbeilegungsverfahren vor einer Verbraucherschlichtungsstelle teilzunehmen.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer # 1 (Public review):

      Significance:

      While most MAVEs measure overall function (which is a complex integration of biochemical properties, including stability), VAMP-seqtype measurements more strongly isolate stability effects in a cellular context. This work seeks to create a simple model for predicting the response for a mutation on the "abundance" measurement of VAMPseq.

      We thank the reviewer for their evaluation of our work and for their comments and feedback below.

      Of course, there is always another layer of the onion, VAMP-seq measures contributions from isolated thermodynamic stability, stability conferred by binding partners (small molecule and protein), synthesis/degradation balance (especially important in "degron" motifs), etc. Here the authors' goal is to create simple models that can act as a baseline for two main reasons:

      (1) how to tell when adding more information would be helpful for a global model;

      (2) how to detect when a residue/mutation has an unusual profile indicative of an unbalanced contribution from one of the factors listed above.

      As such, the authors state that this manuscript is not intended to be a state-of-the-art method in variant effect prediction, but rather a direction towards considering static structural information for the VAMP-seq effects. At its core, the method is a fairly traditional asymmetric substitution matrix (I was surprised not to see a comparison to BLOSUM in the manuscript) - and shows that a subdivision by burial makes the model much more predictive. Despite only having 6 datasets, they show predictive power even when the matrices are based on a smaller number. Another success is rationalizing the VAMPseq results on relevant oligomeric states.

      We thank the reviewer for their summary of the main points of our work. Based on the suggestion by the reviewer, we have added a comparison to predictions with BLOSUM62 to our revised manuscript, noting that we have previously compared the BLOSUM62 matrix to a broader and more heterogeneous set of scores generated by MAVEs (Høie et al, 2022).

      Specific Feedback:

      Major points:

      The authors spend a good amount of space discussing how the six datasets have different distributions in abundance scores. After the development of their model is there more to say about why? Is there something that can be leveraged here to design maximally informative experiments?

      We believe that these effects arise from a combination of intrinsic differences between the systems and assay-specific effects. For example, biophysical differences between the systems, such as differences in absolute folding stabilities or melting temperatures, will play a role, as will the fact that some proteins contain multiple domains.

      Also, the sequencing-based score for an individual variant in a sort-seq experiment (such as VAMP-seq) depends both on the properties of that variant and on the composition of the entire FACS-sorted cell library. This is because cells are sorted into bins depending on the composition of the entire library, which means that library-to-library composition differences can contribute to the differences between VAMP-seq score distributions. 

      From our developed models and outliers in predictions from these, it is difficult to tell which of the several possible underlying reasons cause the differences. We have briefly expanded the discussion of these points in the manuscript, and we have moreover elaborated on this in subsequent work (Schulze et al., 2025).

      They compare to one more "sophisticated model" - RosettaddG - which should be more correlated with thermodynamic stability than other factors measured by VAMP-seq. However, the direct head-tohead comparison between their matrices and ddG is underdeveloped. How can this be used to dissect cases where thermodynamics are not contributing to specific substitution patterns OR in specific residues/regions that are predicted by one method better than the other? This would naturally dovetail into whether there is orthogonal information between these two that could be leveraged to create better predictions.

      We thank the reviewer for this suggestion and indeed had spent substantial effort trying to gain additional biological insights from variants for which MAVE scores or MAVE predictions do not match predicted ∆∆G values. One major caveat in this analysis is that the experimental MAVE scores, MAVE predictions and the predicted ∆∆G values are rather noisy, making it difficult to draw conclusions based on individual variants or even small subsets of variants.

      In our revised manuscript, we have added an analysis to discover residue substitution profiles that are predicted most accurately either by a ∆∆G model or by our substitution matrix model, thereby avoiding analysis of individual variant effect scores. 

      We find that many substitution profiles are predicted equally well by the two model types, but also that there are residues for which one method predicts substitution effects better than the other method. We have added an analysis of the characteristics of the residues and variants for which either the ∆∆G model or the substitution matrix model is most useful to rank variants. Since we only find relatively few residues for which this is the case, we do not expect a model that leverages predicted scores from both methods to perform better than ThermoMPNN across variants. 

      Perhaps beyond the scope of this baseline method, there is also ThermoMPNN and the work from Gabe Rocklin to consider as other approaches that should be more correlated only with thermodynamics.

      We acknowledge that there are other approaches to predict ∆∆G beyond Rosetta including for example ThermoMPNN and our own method called RaSP (Blaabjerg et al, eLIFE, 2023), and we have added comparisons to ThermoMPNN and RaSP in the revised manuscript. We are unsure how one would use the data from Rocklin and colleagues directly, but we note that e.g. RaSP has been benchmarked on this data and other methods have been trained on this data. We originally used Rosetta since the Rosetta model is known to be relatively robust and because it has never seen large databases during training (though we do not think that training of ThermoMPNN and RaSP would be biased towards the VAMP-seq data). We note also that we have previously compared both Rosetta calculations and RaSP with VAMP-seq data for TPMT, PTEN and NUDT15 (Blaabjerg et al, eLIFE, 2023)

      I find myself drawn to the hints of a larger idea that outliers to this model can be helpful in identifying specific aspects of proteostasis. The discussion of S109 is great in this respect, but I can't help but feel there is more to be mined from Figure S9 or other analyses of outlier higher than predicted abundance along linear or tertiary motifs.

      We agree with these points and have previously spent substantial time trying to make sense of outliers in Figure S9 and Figure S18 (Figure S8 and Figure S18 of revised manuscript). The outlier analysis was challenging, in part due to the relatively high noise levels in both experimental data and predictions, and we did not find any clear signals. Some outliers in e.g. Figure S9 are very likely the result of dataset-specific abundance score distributions, which further complicates the outlier analysis. We now note this in the revised paper and hope others will use the data to gain additional insights on proteostasis-specific effects.  

      Reviewer # 2 (Public review):

      Summary:

      This study analyzes protein abundance data from six VAMP-seq experiments, comprising over 31,000 single amino acid substitutions, to understand how different amino acids contribute to maintaining cellular protein levels. The authors develop substitution matrices that capture the average effect of amino acid changes on protein abundance in different structural contexts (buried vs. exposed residues). Their key finding is that these simple structure-based matrices can predict mutational effects on abundance with accuracy comparable to more complex physics-based stability calculations (ΔΔG).

      Major strengths:

      (1) The analysis focuses on a single molecular phenotype (abundance) measured using the same experimental approach (VAMP-seq), avoiding confounding factors present when combining data from different phenotypes (e.g., mixing stability, activity, and fitness data) or different experimental methods.

      (2) The demonstration that simple structural features (particularly solvent accessibility) can capture a significant portion of mutational effects on abundance.

      (3) The practical utility of the matrices for analyzing protein interfaces and identifying functionally important surface residues.

      We thank the reviewer for the comments above and the detailed assessment of our work.

      Major weaknesses:

      (1) The statistical rigor of the analysis could be improved. For example, when comparing exposed vs. buried classification of interface residues, or when assessing whether differences between prediction methods are significant.

      We agree with the reviewer that it is useful to determine if interface residues (or any of the residues in the six proteins) can confidently be classified as buried- or exposed-like in terms of their substitution profiles. Thus, we have expanded our approach to compare individual substitution profiles to the average profiles of buried and exposed residues to now account for the noise in the VAMP-seq data. In our updated approach, we resample the abundance score substitution profile for every residue several thousand times based on the experimental VAMP-seq scores and score standard deviations, and we then compare every resampled profile to the average profiles for buried and exposed residues, thereby obtaining residue-specific distributions of RMSD<sub>buried</sub> and RMSD<sub>exposed</sub> values. These RMSD distributions are typically narrow, since many variants in several datasets have small standard deviations. In the revised manuscript, we report a residue to have e.g. a buried-like substitution profile if RMSD<sub>buried</sub> <RMSD<sub>exposed</sub> for at least 95% of the resampled profiles. We do not recalculate average scores in substitution matrices for this analysis. 

      Moreover, to illustrate potential overlap in predictive performance between prediction methods more clearly than in our preprint, we have added confidence intervals in Fig. 2 and Fig. 3 of the revised manuscript. We note that the analysis in Fig. 2 is performed using a leave-one-protein-out approach, which we believe provides the cleanest assessment of how well the different models perform.

      (2) The mechanistic connection between stability and abundance is assumed rather than explained or investigated. For instance, destabilizing mutations might decrease abundance through protein quality control, but other mechanisms like degron exposure could also be at play.

      We agree that we have not provided much description of the relation between stability and abundance in our original preprint. In the revised manuscript, we provide some more detail as well as references to previous literature explaining the ways in which destabilising mutations can cause degradation. We have moreover performed and added additional analyses of the relationship between thermodynamic stability and abundance through comparisons of stability predictions and predictions performed with our substitution matrix models.

      (3) The similar performance of simple matrix-based and complex physics-based predictions calls for deeper analysis. A systematic comparison of where these approaches agree or differ could illuminate the relationship between stability and abundance. For instance, buried sites showing exposed-like behavior might indicate regions of structural plasticity, while the link between destabilization and degradation might involve partial unfolding exposing typically buried residues. The authors have all the necessary data for such analysis but don't fully exploit this opportunity.

      This is similar to a point made by reviewer 1, and our answer is similar. We were indeed hoping that our analyses would have revealed clearer differences between effects on thermodynamic protein stability and cellular abundance and have tried to find clear signals. One major caveat in performing the suggested analysis is that both the experimental MAVE scores, ∆∆G predictions and our simple matrix-based predictions are rather noisy, making it difficult to make conclusions based on individual variants or even small subsets of variants. 

      To address this point, we have added an analysis to discover residue substitution profiles that are predicted most accurately either by a ∆∆G model or by our substitution matrix model, thereby avoiding analysis of individual variant effect scores. We find that many substitution profiles are predicted equally well by the two model types, but we also, in particular, find solvent-exposed residues for which the substitution matrix model is the better predictor. These residues are often aspartate, glutamate and proline, suggesting that surface-level substitutions of these amino acid types often can have effects that are not captured well by a thermodynamical model, either because this model does not describe thermodynamic effects perfectly, or because in-cell effects are necessary to account for to provide an accurate description.

      (4) The pooling of data across proteins to construct the matrices needs better justification, given the observed differences in score distributions between proteins (for example, PTEN's distribution is shifted towards high abundance scores while ASPA and PRKN show more binary distributions).

      We agree with the reviewer that the differences between the score distributions are important to investigate further and keep in mind when analysing e.g. prediction outliers. However, our results show that the pooling of VAMP-seq scores across proteins does result in substitution matrices that make sense biochemically and can identify outlier residues with proteostatic functions. As we also respond to a related point by reviewer 1, the differences in score distributions likely have complex origins. In that sense, we also hope that our results can inspire experimentalists to design methods to generate data that are more comparable across proteins.

      For example, biophysical differences between the systems, such as differences in absolute folding stabilities or melting temperatures will play a role, as will the fact that some proteins contain multiple domains. Also, the sequence-based score for an individual variant in a sort-seq experiment (such as VAMP-seq) depends both on the properties of that variant and from the composition of the entire FACS-sorted cell library. This is because cells are sorted into bins depending on the composition of the entire library, which means that library-to-library composition can contribute to the differences between VAMP-seq score distributions. From our developed models and outliers in predictions from these, it is difficult to tell which of the several possible underlying reasons cause the differences.

      Thus, even when experiments on different proteins are performed using the same technique (VAMP-seq), quantifying the same phenomenon (cellular abundance) and done in similar ways (saturation mutagenesis, sort-seq using four FACS bins), there can still be substantial differences in the results across different systems. An interesting side result of our work is to highlight this including how such variation makes it difficult to learn across experiments. We now elaborate on these points in the revised manuscript.

      (5) Some key methodological choices require better justification. For example, combining "to" and "from" mutation profiles for PCA despite their different behaviors, or using arbitrary thresholds (like 0.05) for residue classification.

      We hope we have explained our methodological choices clearer in the revised paper.

      We removed the dependency of the threshold of 0.05 used for residue classification in Fig. S19 of the original manuscript; in the revised manuscript we only report a residue to have e.g. a buried-like substitution profile if RMSD<sub>buried</sub> <RMSD<sub>exposed</sub> for at least 95% of the abundance score profiles that we resampled according to VAMP-seq score noise levels, as explained above.

      With respect to combining “to” and “from” mutational profiles for PCA, we could have also chosen to analyse these two sets of profiles separately to take potentially different behaviours along the two mutational axes into account. We do not think that there should be anything wrong with concatenating the two sets of profiles in a single analysis, since the analysis on the concatenated profiles simply expresses amino acid similarities and differences in a more general manner.

      The authors largely achieve their primary aim of showing that simple structural features can predict abundance changes. However, their secondary goal of using the matrices to identify functionally important residues would benefit from more rigorous statistical validation. While the matrices provide a useful baseline for abundance prediction, the paper could offer deeper biological insights by investigating cases where simple structure-based predictions differ from physics-based stability calculations.

      This work provides a valuable resource for the protein science community in the form of easily applicable substitution matrices. The finding that such simple features can match more complex calculations is significant for the field. However, the work's impact would be enhanced by a deeper investigation of the mechanistic implications of the observed patterns, particularly in cases where abundance changes appear decoupled from stability effects.

      We agree that disentangling stability and other effects on cellular abundance is one of the goals of this work. As discussed above, it has been difficult to find clear cases where amino acid substitutions affect abundance without stability beyond for example the (rare) effects of creating surface exposed degrons. Our new analysis, in which we compare substitution matrix-based predictions to stability predictions, does offer deeper insight into the relationship between the two predictor types and hence possibly between folding stability and abundance. 

      Reviewer #3 (Public review): 

      "Effects of residue substitutions on the cellular abundance of proteins" by Schulze and Lindorff-Larsen revisits the classical concept of structure-aware protein substitution matrices through the scope of modern protein structure modelling approaches and comprehensive phenotypic readouts from multiplex assays of variant effects (MAVEs). The authors explore 6 unique protein MAVE datasets based on protein abundance (and thus stability) by utilizing structural information, specifically residue solvent accessibility and secondary structure type, to derive combinations of context-specific substitution matrices predicting variant abundance. They are clear to outline that the aim of the study is not to produce a new best abundance predictor but to showcase the degree of prediction afforded simply by utilizing information on residue accessibility. The performance of their matrices is robustly evaluated using a leave-one-out approach, where the abundance effects for a single protein are predicted using the remaining datasets. Using a simple classification of buried and solvent-exposed residues, and substitution matrices derived respectively for each residue group, the authors convincingly demonstrate that taking structural solvent accessibility contexts into account leads to more accurate performance than either a structureunaware matrix, secondary structure-based matrix, or matrices combining both solvent accessibility or secondary structure. Interestingly, it is shown that the performance of the simple buried and exposed residue substitution matrices for predicting protein abundance is on par with Rosetta, an established and specialized protein variant stability predictor. More importantly, the authors finish off the paper by demonstrating the utility of the two matrices to identify surface residues that have buried-like substitution profiles, that are shown to correspond to protein interface residues, posttranslational modification sites, functional residues, or putative degrons.

      Strengths:

      The paper makes a strong and well-supported main point, demonstrating the utility of the authors' approach through performance comparisons with alternative substitution matrices and specialized methods alike. The matrices are rigorously evaluated without introducing bias, exploring various combinations of protein datasets. Supplemental analyses are extremely comprehensive and detailed. The applicability of the substitution matrices is explored beyond abundance prediction and could have important implications in the future for identifying functionally relevant sites.

      We thank the reviewer for the supportive comments on our work. 

      Comments:

      (1) A wider discussion of the possible reasons why matrices for certain proteins seem to correlate better than others would be extremely interesting, touching upon possible points like differences or similarities in local environments, degradation pathways, posttranslation modifications, and regulation. While the initial data structure differences provide a possible explanation, Figure S17A, B correlations show a more complicated picture.

      We agree with the reviewer that biochemical and biophysical differences between the proteins might contribute to the fact that some matrices correlate better than others. We also agree that it would be very interesting to understand these differences better. While it might be possible to examine some of the suggested causes of the differences, like differences or similarities in local environments, we have generally found that noise and differences in score distributions make such analyses difficult (see also responses to reviewers 1 and 2). For now, we will defer additional analyses to future work.

      (2) The performance analysis in Figure 2D seems to show that for particular proteins "less is more" when it comes to which datasets are best to derive the matrix from (CYP2C9, ASPA, PRKN). Are there any features (direct or proxy), that would allow to group proteins to maximize accuracy? Do the authors think on top of the buried vs exposed paradigm, another grouping dimension at the protein/domain level could improve performance?

      We don’t currently know if any protein- or domain-level features could be used to further split residues into useful categories for constructing new substitution matrices, but it is an interesting suggestion. We note that every substitution matrix consists of 380 averages, and creating too many residue groupings will cause some matrix entries to be averaged over very few abundance scores, at least with the current number of scores in the pooled VAMP-seq dataset. For example, while previous work has shown different mutational effects e.g. in helices and sheets (as one would expect), we find that a model with six matrices ({buried,exposed}x{helix,sheet,other}) does not lead to improved predictions (Fig. 2C), presumably because of an unfavourable balance between parameters and data.

      (3) While the matrices and Rosetta seem to show similar degrees of correlation, do the methods both fail and succeed on the same variants? Or do they show a degree of orthogonality and could potentially be synergistic?

      These are good questions and are related to similar questions from reviewers 1 and 2. In the revised manuscript, we have added additional analyses of differences between predictions from our substitution matrix model and a stability model, and we indeed find that the two methods show a degree of orthogonality. However, since we identify only relatively few residues for which one method performs better than the other, we don’t expect a synergistic model to outperform the stability predictor across all variants in any of the six proteins.  

      Overall, this work presents a valuable contribution by creatively utilizing a simple concept through cutting-edge datasets, which could be useful in various.

      Reviewing Editor:

      As discussed in more detail below, to strengthen the assessment, the authors are encouraged to:

      (1) Include more thorough statistical analyses, such as confidence intervals or standard errors, to better validate key claims (e.g., RMSD comparisons).

      (2) Perform a deeper comparison between substitution response matrices and ΔΔG-based predictions to uncover areas of agreement or orthogonality

      (3) Clarify the relationship between structural features, stability, and abundance to provide more mechanistic insights.

      As discussed above and below, we have added new analyses and clarifications to the revised manuscript.

      Reviewer #1 (Recommendations for the authors):

      Minor points:

      Why is a continuous version of the contact number used here, instead of a discrete count of neighbouring residues? WCN values of the residues in the core domain can be affected by residues far away (small contribution but not strictly zero; if there are many of them, it adds up).

      We have previously found WCN, which quantifies residue contact numbers in a continuous manner, to be a useful input feature for a classifier that determines whether individual residues are important for maintaining protein abundance or function (Cagiada et al, 2023). We have also found WCN and the cellular abundance of single substitution variants to correlate well in individual analyses of different proteins (Grønbæk-Thygesen et al., 2024; Gersing et al., 2024; Clausen et al., 2024).

      We have calculated the WCN as well as a contact number based on discrete counts of neighbouring residues for the six proteins in our dataset. When distances between residues are evaluated in the same way (i.e. using the shortest distance between any pair of heavy atoms in the side chains), and when the cutoff value used for the discrete count is equal to the r<sub>0</sub> of the WCN function, the continuous and discrete evaluations of residue contact numbers are highly and linearly correlated, and their rank correlation with the VAMP-seq data are very similar. We only observe minor contributions from residues far away in the structure on the WCN.

      Typos in SI figure captions e.g. Figure S8-11 "All predictions were performed using using...."

      Thank you for pointing this out. We have corrected the typos in Figure S8-11 (Figure S7-S10 in the revised manuscript).

      Personally, I'd appreciate a definition of these new substitution matrices under the constraints of rASA/WCN values. It was unclear to me until I read the code but we think that the definition is averaging the substitution matrix based on the clusters they are assigned to. If so, this could be straightforwardly defined in the method section with a heaviside step function.

      We have added a definition of the “buried” and “exposed” substitution matrices as a function of rASA in the methods section (“Definitions of buried and exposed residues” and “Definition of substitution matrices”) of the manuscript, as well as a definition of how we classified residues as either buried or exposed using both rASA and WCN as input. Our final substitution matrices, as shown in e.g. Fig. 2, do not depend on the WCN; only the substitution matrix results in Figure S6 (Figure S20 in the revised manuscript) depend on both WCN and rASA.

      Reviewer #2 (Recommendations for the authors):

      The following suggestions aim to strengthen the analysis and clarify the presentation of your findings:

      (1) Specific analyses to consider:

      (1.1) Analyze buried positions where the exposed matrix performs better. Understanding these cases might reveal properties of protein core regions that show unexpected mutational tolerance.

      We agree with the reviewer that a more detailed analysis of buried residues with exposed-like substitution profiles would be very interesting.

      We note that for proteins where the VAMP-seq score distribution is shifted towards high values (as it is the case for PTEN, TPMT and CYP2C9), our identification of such residues may be a result of the score distribution differences between the six datasets. To confidently identify mutationally tolerant core regions, it would be best to (a) correct for the distribution differences prior to the analysis or (b) focus the analysis on residues that fall far below the diagonal in Figure S18.

      In additional data (which can be found at https://github.com/KULL-Centre/_2024_Schulze_abundance-analysis)) ,we provide, for each of the proteins, a list of buried residues for which RMSD<sub>exposed</sub> <RMSD<sub>buried</sub> (for more than 95% of resampled substitution profiles, as described under 1.6). We have not analysed these residues further.

      (1.2) A systematic comparison of matrix-based vs. ΔΔG-based predictions could help understand both exposed sites that behave as buried (as analyzed in the paper) and buried sites that behave as exposed (1.1), potentially revealing mechanisms underlying abundance changes.

      In our revised manuscript, we have added additional analyses to compare matrixbased and ΔΔG-based predictions, focusing on exposed sites for which one prediction method captures variant effects on abundance considerably better the other prediction method. We have not investigated buried sites with exposed-like behaviour any further in this work.

      (1.3) Explore different normalization approaches when pooling data across proteins. In particular, consider using log(abundance score): if the experimental error in abundance measurements is multiplicative (which can be checked from the reported standard errors), then log transformation would convert this into a constant additive error, making the analysis more statistically sound.

      As we answer below to point 2.2, the abundance scores are, within each dataset, min-max normalised to nonsense and synonymous variant scores, and the score scale is thus in this way consistent across the six datasets. We have explained above and in the revised manuscript that abundance score distribution differences across datasets are likely partially a result of the FACS binning of assay-specific variant libraries. Using only the VAMP-seq scores (that is, without further information about the individual experiments), we cannot correct for the influence of the sorting strategy on the reported scores. A score normalisation across datasets that places all data points on a single scale would require inter-dataset references variant scores, which we do not have. We note that in a subsequent manuscript (Schulze et al, bioRxiv, 2025) we have attempted to take system- and experimentspecific score distributions into account. We now refer to this work in the revised manuscript.

      (1.4) Consider using correlation coefficients between predicted and observed abundance profiles as an alternative to RMSD, which is sensitive to the absolute values of the scores.

      We agree with the reviewer that using correlation coefficients to compare substitution profiles might also be useful, in particular for datasets with relatively unique VAMP-seq score distributions, such as the ASPA dataset. To explore this idea, we have repeated the analysis presented in Fig. S18 using the Pearson correlation coefficient r rather than the RMSD.

      As in Fig. S18, we derive r<sub>buried</sub> and r<sub>exposed</sub> for every residue in the six proteins, specifically by calculating r between the abundance score substitution profile of every individual residue and the average abundance score substitution profiles of buried and exposed residues. VAMP-seq data for the protein for which r<sub>buried</sub> and r<sub>exposed</sub> are evaluated is omitted from the calculation of average abundance score substitution profiles, and we use only monomer structures to determine whether residues are buried or exposed. 

      We show the results of this analysis in an Author response image 1 below. In each panel of the figure, r<sub>buried</sub> and r<sub>exposed</sub> are shown for individual residues of a single protein. Blue datapoints indicate residues that are solvent-exposed in the wild-type protein structures, and yellow datapoints indicate residues that are buried in the wild-type structures. Residues for which it is not the case that r<sub>buried</sub> < r<sub>exposed</sub> or r<sub>exposed</sub><r<sub>buried</sub> in more than 95% of 1000 resampled residue substitution profiles (see explanation of resampling method above) are coloured grey. “Acc.” is the balanced classification accuracy, calculated using all non-grey datapoints, indicating how many buried residues have buried-like substitution profiles (r<sub>exposed</sub><r<sub>buried</sub>) and how many solvent-exposed residues have exposed-like substitution profiles (r<sub>buried</sub> < r<sub>exposed</sub>). The classification accuracy per protein in this figure cannot be compared to the classification accuracy of the same protein in Fig. S18, since the number of datapoints used in the accuracy calculation differ between the r- and RMSD-based analyses. 

      Author response image 1.

      Comparing the r-based approach to the RMSD-based approach (Fig. S18), it is clear that the r-based method is less robust than the RMSD-based method for noisy and incomplete datasets. For the noisiest and most mutationally incomplete VAMP-seq datasets (i.e., PTEN, TPMT and CYP2C9) (Fig. 1), there are relatively few residues for which we with high confidence can determine if the substitution profile is more buried- or more exposed-like. When the VAMP-seq data is less noisy and has high mutational completeness, the r-based method becomes more robust and may thus be relevant in potential future work on new VAMP-seq data with small error bars.

      In conclusion, we find that RMSD-based approach to compare substitution profiles is more robust than an r-based approach for several of the VAMP-seq datasets that are included in our analysis. We do believe than an approach based on the correlation coefficient, or potentially several metrics, could be relevant to use, since abundance score distributions from VAMP-seq datasets can differ significantly across datasets. So as not to increase the length of the main text of our manuscript, we have not added this analysis to the revised manuscript.

      (1.5) Consider treating missing abundance scores as zero values, as they might indicate variants with very low abundance, rather than omitting them from the analysis.

      This suggestion would be most relevant for the PTEN, TPMT and CYP2C9 datasets, which all have a relatively small average mutational depth and completeness, as shown in Fig. 1B and 1C. To assess if setting missing abundance scores as zero values would be reasonable, we have compared the distributions of predicted ΔΔG values (from RaSP and ThermoMPNN) and of predicted abundance scores (from our exposure-based substitution matrices) for variants with reported and missing VAMP-seq data. We show the result in Author response image 2, with data aggregated across the six protein systems:

      Author response image 2.

      We find that variants with and without VAMP-seq data have similar ΔΔG score distributions and similar predicted abundance score distributions, and there is thus no clear enrichment of predicted loss of abundance for variants with missing VAMP-seq scores. This suggests that missing abundance scores do not necessarily indicate very low abundance. One cause of missing data might instead be problems with library generation (Matreyek et al, 2018, 2021).

      We show in Fig. S9 (Fig. S8 of the revised manuscript) that predicted scores for variants with experimental abundance scores of 0 are often overestimated for NUDT15, ASPA and PRKN, but this is not so much a problem for PTEN, TMPT and CYP2C9, the datasets with most missing scores. The lack of an enrichment of low abundance variants from the various predictors would thus still support that missing scores do not necessarily indicate low abundance.

      (1.6) Develop a proper statistical framework for comparing buried vs exposed predictions (whether using RMSD or correlations), including confidence intervals, rather than using arbitrary thresholds.

      As explained above and in the methods section of our revised manuscript, we have expanded our approach to compare the substitution profile of a residue to the average profiles of buried and exposed residues, and our method now accounts for the noise in the VAMP-seq data, making the analysis more statistically rigorous. In our expanded approach, we compare the substitution profiles of individual residues to the average profiles for buried and exposed residues 10,000 times per residue to get a residue-specific distribution of RMSD<sub>buried</sub> and RMSD<sub>exposed</sub> values. Individual RMSD<sub>buried</sub> and RMSD<sub>exposed</sub> values are calculated by resampling abundance scores from a Gaussian distribution defined by the experimentally reported abundance score and abundance score standard deviation per variant. We now only report a residue to have e.g. a buried-like substitution profile if RMSD<sub>buried</sub> < RMSD<sub>exposed</sub> in at least 95% of our samples. We do not recalculate average scores in substitution matrices for this analysis. We have updated the plots in our manuscript, e.g. in Fig. S18 and S19 of the revised version, to indicate which residues are confidently classified as buried- or exposed-like.

      (2) Presentation improvements:

      (2.1) In Figure 4, consider removing the average abundance scores, which are not directly related to the RMSD comparison being shown.

      We have decided to keep the average abundance scores in Fig. 4 (now Fig. 5), as we find the average abundance scores useful for guiding interpretation of the RMSD values. For example, an unusually small average abundance score with a relatively small standard deviation may explain a case where RMSD<sub>buried</sub> and RMSD<sub>exposed</sub> are both large. This is for example the case for residue G185 in ASPA. 

      In our preprint, the error bars on the average abundance scores in Fig. 4 (now Fig. 5) indicated the standard deviation across the abundance scores that were used to calculate the average per position. We have removed these error bars in the revised manuscript, as we realised that these were not necessarily helpful to the reader.

      (2.2) I am assuming that abundance scores are defined as the ratio abundance_variant/abundance_wt throughout the analysis, but I don't think this has been explicitly defined. If this is correct, please state it explicitly. In such case, log(abundance_score) would have a simple interpretation as the difference in abundance between variant and wild-type.

      Abundance scores are defined throughout the manuscript as sequence-based scores that have been min-max normalised to the abundance of nonsense and synonymous variants, i.e. abundance_score = (abundance_variant abundance_nonsense)/(abundance_wt–abundance_nonsense). We have described the normalisation of scores to wild-type and nonsense variant abundance in lines 164-166 of the original manuscript. We have now added additional information about the normalisation scheme in the methods section. We note that we did not ourselves apply this normalisation to the data; the scores were reported in this manner in the original publications that reported the VAMP-seq experiments for the six proteins.

      (2.3) Consider renaming "rASA" to the more commonly used "RSA" for relative solvent accessibility.

      We have decided to keep using “rASA” throughout the manuscript.

      (2.4) The weighted contact number function used differs from the established WCN measure (Σ1/rij²) introduced by Lin et al. (2008, Proteins). This should be acknowledged and the choice of alternative weighting scheme justified.

      As we have also responded to the first minor point of reviewer 1, we have previously found WCN, as it is defined in our manuscript, to be a useful input feature for a classifier that determines whether individual residues are important for maintaining protein abundance or function (Cagiada et al, 2023). We have also previously found this type of WCN to correlate well with variant abundance of individual proteins, as measured with VAMP-seq or protein fragment complementation assays (Grønbæk-Thygesen et al., 2024; Clausen et al., 2024; Gersing et al., 2024). We acknowledge that residue contact numbers or weighted contact numbers could also be expressed in other ways and that alternative contact number definitions would likely also produce values that correlate well with VAMP-seq data. Since the WCN, as defined in our manuscript, already correlates relatively well with abundance scores, we have not explored whether alternative definitions produce better correlations.  

      (2.5) Replace the phrase "in the above" with specific references to sections or simply "above" where appropriate. Also, consider replacing many instances of "moreover" with simpler alternatives such as "also" or "in addition" to improve readability.

      We have changed several sentences according to this suggestion and hope that we have improved the readability of our manuscript.

      Reviewer #3 (Recommendations for the authors):

      (1) It should be explicitly confirmed earlier that complex structures are used for NUDT15 and ASPA when assessing rASA/WCN. Additionally, it would be interesting to see the effect that deriving the matrices using NUDT15 and ASPA monomers would have.

      We have commented on the use of NUDT15 and ASPA homodimer structures earlier in the revised manuscript (specifically already in the subsection Abundance scores correlate with the degree of residue solvent-exposure section).

      When residues are classified using monomer rather than dimer structures of NUDT15 and ASPA, there is a small effect on the resulting “buried” and “exposed” substitution matrices. Entries in this set of substitution matrices calculated using either monomer or dimer structures typically differ by less than 0.05, and only a single entry differ by more than 0.1. As expected, the “exposed” matrix tend to contain slightly larger numbers when derived from dimer structures than when derived from monomer structures, meaning that when the interface residues are included in the exposed residue category, the average abundance scores of the “exposed” matrix are lowered. For buried residues, the picture is more mixed, although the overall tendency is that the interface residues make the “buried” matrix contain smaller average abundance scores for dimer compared to monomer structures. These results generally support the use of dimer structures for the residue classification.

      We here show the differences between the substitution matrices calculated with dimer or monomer structures of NUDT15 and ASPA and using data for all six proteins in our combined VAMP-seq dataset (average_abundance_score_differece = average_abundance_score_dimers – average_abundance_score _monomers):

      Author response image 3.

      We have not explored these alternative matrices further.

      (2) While the supplemental analyses are rigorous, the abundance of various metrics being presented can be confusing, especially when they seem to differ in their result. For instance, the discussion of Figure S17 (paragraph starting 428) contains mentions of mean differences but then switches to correlations, while both are presented for all panels. The claim "The datasets thus mainly differ due to differences in substitution effects in buried environments. " is well supported by the observed mean differences, but for Pearson's correlations the average panel A ,B values of buried 0.421 vs exposed 0.427 are hardly different. Which of the metrics is more meaningful, and are both needed?

      We agree with the reviewer that the claim that “The datasets thus mainly differ due to differences in substitution effects in buried environments” is not well-supported by the r between the substitution matrices, and we have removed this claim from the text.

      Since some datasets share VAMP-seq score distribution features, while others do not, the absolute difference between scores or matrices may be relevant to check for some dataset pairs, while the r may be more relevant to check for other dataset pairs. Hence, we have included both metrics in Fig S17 (Fig S11 in the revised manuscript).

      (3) Lines 337-340 - does not feel like S7 is the topic, perhaps the authors meant Figure 2A, B? In general, the supplemental figure references are out of order and panel combinations are sometimes confusing.

      We have corrected figures references to now be correct and changed the arrangement of supplemental figures so that they now occur in the correct order. We have looked through the panel combinations with clarity in mind, and hope that the current set of main and supplementary figures balances overview and detail.

      (4) Line 363 "are also are also".

      We have corrected this typo.