10,000 Matching Annotations
  1. Sep 2025
    1. What does this bot do that a normal person wouldn’t be able to, or wouldn’t be able to as easily?

      Since bots are automated social media accounts, mainly made by lines of code, one thing they will struggle with is replicating authentic human responses. For example, Someone who buys follower bots on Instagram will often display unequal follower to engagement ratios. So whilst their follower count may be in the millions, their likes and comments often underperform since these bots were not programed to interact with content as well. In addition, responses from AI bots like Chatgpt are basically an accumulation of data from the web/ fed to it. So it's answers cannot be considered authentic human based knowledge.

    2. Why do you think social media platforms allow bots to operate?

      Couple reasons, one is that bots tend to be good for engagement if they are deployed in the right way. These companies make their money advertising so upping that engagement is exactly what they want. Another is that usually to make a bot you need to buy the rights to edit the code of the app unless the company's software is open sourced.

  2. social-media-ethics-automation.github.io social-media-ethics-automation.github.io
    1. Scratch - Imagine, Program, Share. URL: https://scratch.mit.edu/ (visited on 2023-11-17).

      This is a programming platform and it was the first “language” I learned in high school. Unlike C or Python, Scratch uses a block-based system with visuals. Instead of typing code, you drag and drop blocks, making it more intuitive and visual. Fun story: when I was taking the class in high school, some of my friends in the Java programming class made fun of it, saying Scratch wasn’t “real programming.” But if you look at what’s happening in the back end, implementing something like Scratch is actually very challenging.

    1. The paceof adoption of digitally centred archaeological data and digitally facilitatedarchaeological practice has not been met by the adoption of discipline-widestandards related to archaeological ethics. The result of this mismatch in eth-ics and practice is the creation of archaeologists who utilize digital forms, butwhose archaeology is ungrounded in frameworks that specifically consider theethical burdens of digital tools, methodology, and theory.

      This is very interesting, I had never originally considered the implications of ethics when it comes to using digital technologies in archaeology but it is a conversation worth having for sure. Archaeologists are dealing with very precious artefacts and have an ethical code to follow, the previous frameworks have not caught up yet. This is something we are seeing in a lot of different industries as well where the technology is moving faster than policy.

    1. Reviewer #3 (Public review):

      Summary:

      The present work was aimed at investigating the specific contributions of thalamic nuclei to associative threat learning and extinction. Using fMRI, they examined activation patterns across pulvinar divisions, the lateral geniculate nucleus (LGN), and the mediodorsal thalamus (MD) during threat acquisition, extinction, and recall. Their goal was to uncover whether distinct thalamic systems support different modes of learning-automatic survival mechanisms versus more deliberate processes - and to propose a hierarchical pulvinar model of fear conditioning. They also try to refine current neuroanatomical models of threat learning and memory, highlighting the role of thalamic nuclei in it.

      Strengths:

      (1) Valuable theoretical elaboration and modeling regarding the differential role of pulvinar subdivisions on feedforward (inferior, lateral) and higher-order integration (anterior), and their functional interplay with other relevant subcortical and cortical structures in associative threat and extinction learning.

      (2) Large sample sizes and multipronged analytical approaches were used for hypothesis testing.

      (3) Exhaustive literature review in the field of associative threat, as well as regarding the role of thalamic nuclei and other brain structures in it.

      Weaknesses:

      (1) Several weaknesses should be pointed out regarding how fMRI data were collected, as well as decisions regarding how the fMRI data were preprocessed and analyzed:

      a) fMRI data have low resolution (3 cubic mm), which certainly limits the examination of small nuclei such as the ones investigated here, and especially the examination of the LGN and inferior pulvinar.

      b) fMRI was normalized to standard space. Analyzing the data in individual-subject space would have given you the options of avoiding altering every participant's brain and of using a probabilistic thalamic atlas that better adapts to each subject's brain and thalamic nuclei (see, for instance, Iglesias et al., 2018). This would have been ideal and would have given the authors more precision, especially considering the low resolution of the fMRI data and the size of the thalamic nuclei of interest.

      c) On top of the two previous points, the authors decided to smooth the data to 6mm, which means that every single voxel within these small nuclei was blurred/mixed with the 2 immediately contiguous voxels (if they followed the standard SPM12 normalization resampling default which resamples, or upsamples the data in this case, to 2 x 2 x 2mm). Given the strong changes in structural connectivity and function that can occur, especially in the thalamus, on voxels of this size, this and the previous 2 decisions do not favor anatomical precision.

      d) Motion during scanning was poorly controlled in the preprocessing. Including the motion parameters as covariates of no interest in the GLM does not fully guarantee that motion is not influencing the results, and that motion is not differentially influencing some experimental conditions more than others.

      (2) It is not clearly indicated in the manuscript how many subjects and how many trials went into each of the analyses. It would be important to indicate this in the text and/or the figures.

      (3) It is not clear either, why, given the large sample size, some of the results were not conducted using reproducibility strategies such as dividing the sample into 2 or 3 groups or using further cross-validation strategies.

      (4) Limited testing of alternative hypotheses. The results clearly seem to be a selection of the findings supporting the hypotheses that the authors sought to confirm. (just one example: in the analysis reported in Figures 1-2; are there other correlations between the activation of the anterior pulvinar and MD with other pulvinar nuclei? only the MD-anterior Puv is reported).

      (5) The manuscript does not contain a limitations subsection. Practically every study has limitations, and this one is not an exception. Better to tell the limitations to the readers upfront so they can factor them into their evaluation of the relevance of the manuscript and reported evidence.

      (6) Data should be made available to the scientific community. Code too. Even if you just used standard fMRI toolboxes, any code used to run analyses will be helpful to the community, or if someone decides to try to replicate your findings.

      Despite these weaknesses and what can be derived from them, this manuscript constitutes a valuable contribution to the field to start characterizing and conceptualizing the involvement of thalamic nuclei and their interactions with other brain regions in the associative threat learning circuitries. It also paves the road for further testing of the functional dynamics among these regions and circuitries, and modeling testing.

    1. AbstractBackground Reference genomes for the entire sea turtle clade have the potential to reveal the genetic basis of traits driving the ecological and phenotypic diversity in these ancient and iconic marine species. Furthermore, these genomic resources can support conservation efforts and deepen our understanding of their unique evolution.Results We present haplotype-resolved, chromosome-level reference genomes and high-quality gene annotations for five sea turtle species. This completes the catalog of reference genomes of the entire sea turtle clade when combined with our previously published reference genomes. Our analysis reveals remarkable genome synteny and collinearity across all species, despite the clade’s origin dating back more than 60 million years. Regions of high interspecific genetic distance and intraspecific genetic diversity are consistently clustered in genomic hotspots, which are enriched with genes coding for immune response proteins, olfactory receptors, zinc fingers, and G-protein-coupled receptors. These hotspot regions may offer insights into the genetic mechanisms driving phenotypic divergence among species, and represent areas of significant adaptive potential. Ancient demographic analysis revealed a synchronous population expansion among sea turtle species during the Pleistocene, with varying magnitudes of demographic change, likely shaped by their diverse ecological adaptations, and biogeographic contexts.Conclusions Our work provides genomic resources for exploring genetic diversity, evolutionary adaptations, and demographic histories of sea turtles. We outline genomic regions with increased diversity, linked to immune response, sensory evolution, and adaptation to varying environments that have historically been subject to strong diversifying selection, and likely will underpin sea turtle’s responses to future environmental change. These reference genomes can assist conservation by providing insights into the demographic and evolutionary processes that sustain and threaten these iconic species.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf105), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 1: Laura Caquelin

      1. Summary of the Study The authors aimed to create high-quality reference genomes for five sea turtle species to better understand their genetic diversity, evolutionary adaptations, and ecological traits. They used haplotype-resolved, chromosome-level reference genomes and gene annotations to reveal conserved genome structures, genetic hotspots linked to immune response and sensory evolution, and patterns of demographic expansion. Their findings highlight areas of genetic diversity critical for adaptation and conservation efforts.

      2. Scope of reproducibility

      According to our assessment the primary objective is: Investigation of multi-copy gene family enrichment in genomic hotspots of sea turtles.

      • Outcome: Significant enrichment of "MHC", "Immunology-related", "G-Protein Coupled Receptor" (GPCR), "Olfactory Receptor" or "Zinc-Finger" in genomic hotspots with high genetic divergence, diversity, and gene density.
      • Analysis method outcome: Fisher's exact test followed by Benjamini-Hochberg correction
      • Main result: "Following functional annotation of the genes found in these hotspots, we found enrichment for multi-copy gene families coding for proteins with functions in immune response, olfactory receptors (ORs), zinc fingers, and G-protein-coupled receptors (GPCRs_ (Fig 4c, Tables S6 & S7). This included enrichment of immunology-related genes, GPCRs, ORs, and Zinc-finger genes in chromosome 13 (adjusted p < 10-42, 10-47, 10-79, 0.01, respectively), MHC genes, Immunology-related genes, GPCRs, ORs, and Zinc-finger genes in chromosome 14 (adjusted p < 10-24, 10-6, 10-2, 10-10, 10-52, respectively) and Immunology-related genes and GPCRs in chromosome 24 (adjusted p < 10-3 and 10-3, respectively)." (page 10).

      • Availability of Materials a. Data

      • Data availability: Open
      • Data completeness: Complete
      • Access Method: Repository
      • Repository: https://git.imp.fu-berlin.de/begendiv/sea_turtlegenomes
      • Data quality: The data files have been shared and appear sufficient for running the analyses. However, no metadata is provided to describe the content, structure, or origin of the files which limits interpretability and reusability. b. Code
      • Code availability: Open
      • Programming Language(s): R (for the enrichment test)
      • Repository link: https://git.imp.fu-berlin.de/begendiv/sea_turtlegenomes
      • License: MIT license
      • Repository status: Public
      • Documentation: Short README, describe only the presentation of the directory.

      • Computational environment of reproduction analysis

      • Operating system for reproduction: MacOS 14.7.4

      • Programming Language(s): R
      • Code implementation approach: Using shared code
      • Version environment for reproduction: R version 4.4.1/RStudio 2024.09.0

      • Results

      5.1 Original study results

      Results 1: The main results are presented in Figure 4 and the numerical p-values are available on supplementary table 6 and table 7.

      5.3 Steps for reproduction -> Run the code "enrichment_test.R" shared on Git - Issue 1: Files needed to run the code are not shared in the Git repository: "GCF_009764565.3_rDerCor1.pri.v4_genomic.longest.aa.tsv", "hotspots_chr13.longest.aa.tsv", "hotspots_chr14.longest.aa.tsv", "hotspots_chr24.longest.aa.tsv". -- Resolved: These analysis data are not shared in the internal Gigascience FTP server or the Git repository. After request, the authors uploaded all the files into the Git repository.

      5.4 Statistical comparison Original vs Reproduced results - Results: The table S6 and S7 was reproduced: -- Supplementary table S6: see screenshot from R console -- Supplementary table S7: see screenshot from R console

      • Comments: The original R code "enrichment_test.R" simply stored the p-values results in a value object. To simplify the comparison process, directly obtain the final table, and ensure reproducibility while minimizing errors, we implemented the creation of the table.

      ------------------ Start of R code ------------------ Creating final tables Corresponding to supplementary table S6 table_S6 <- data.frame( enrichment = c("MHC", "Immunology", "GPCR", "Olfactory", "Zinc-finger"), Chr13 = c(p_mhc13, p_immune13, p_gpcr13, p_or13, p_zinc13), Chr14 = c(p_mhc14, p_immune14, p_gpcr14, p_or14, p_zinc14), Chr24 = c(p_mhc24, p_immune24, p_gpcr24, p_or24, p_zinc24))

      Corresponding to supplementary table S7 Create a vector of names for rows and columns ( ! warning the pvalues in fdrs are not in the same order as the table S7) enrichment <- c("MHC", "Olfactory", "GPCR", "Immunology", "Zinc-finger") chromosomes <- c("Chr13", "Chr14", "Chr24")

      Reorganizing fdrs in a matrix table_S7 <- matrix(fdrs, nrow = length(enrichment), byrow = TRUE) rownames(table_S7) <- enrichment colnames(table_S7) <- chromosomes

      Organizing rows as the original table S7 library(dplyr) table_S7 <- as.data.frame(table_S7) # Convert matrix to data frame table_S7 <- table_S7 %>% slice(match(c("MHC", "Immunology", "GPCR", "Olfactory", "Zinc-finger"), enrichment)) ------------------- End of R code -------------------

      • Errors detected: The statement "MHC genes, Immunology-related genes, GPCRs, ORs, and Zinc-finger genes in chromosome 14 (adjusted p < 10^-24, 10^-6, 10^-2, 10^-10, 10^-52, respectively)" (page 10) appears to contain an error. Specifically, the p-value for Olfactory Receptors (5.583367e-10) is greater than the threshold of 10^-10, suggesting that this value should instead be below 10^-9. Therefore, the threshold for Olfactory Receptors should be revised to 10^-9.

      • Statistical Consistency: The p-values are consistent (see screenshot from R console).

      • Conclusion

      • Summary of the computational reproducibility review The inferential statistics for the objective "Investigation of multi-copy gene family enrichment in genomic hotspots of sea turtles" were successfully reproduced using the original analysis code provided by the authors. The input data needed to run the code were initially unavailable but were subsequently shared through the Git repository. An inconsistency was noted in the text of the manuscript reporting a threshold for Olfactory Receptors, where the stated 10^-10 should be revised to 10^-9 based on the observed p-value (5.583367e-10).

      • Recommendations for authors While the original analysis code was successfully used to reproduce the results, we recommend improving the documentation to enhance clarity and reproducibility. In particular: -- Code annotation: The scripts would benefit from more detailed comments within the code to clarify the logic of each step. This would greatly help users follow the analyses more easily and understand the purpose of specific commands or operations. -- README file: The current README provides only a general overview. We suggest expanding it to include: --- A brief description of each script or analysis pipeline. --- An indication of which figure, table, or result in the manuscript each script corresponds to. --- Clear instructions on how to execute the analyses in the correct order, if applicable. -- Metadata: For the datasets used or generated by the scripts, it would be helpful to include accompanying metadata files that explain: --- The definition of each variable name. --- The origin of each dataset (raw, processed, etc). --- Any preprocessing steps applied before analysis. -- Data availability: At this stage, we have only verified the reproducibility of one part of the study. To facilitate full reproducibility of the entire study, we recommend sharing all necessary data files required to run every script present in the repository.

      These improvements would make the repository significantly more user-friendly and would strengthen the reproducibility of the study.

    1. AbstractThe vast majority of cancers exhibit Somatic Copy Number Alterations (SCNAs)—gains and losses of variable regions of DNA. SCNAs can shape the phenotype of cancer cells, e.g. by increasing their proliferation rates, removing tumor suppressor genes, or immortalizing cells. While many SCNAs are unique to a patient, certain recurring patterns emerge as a result of shared selectional constraints or common mutational processes. To discover such patterns in a robust way, the size of the dataset is essential, which necessitates combining SCNA profiles from different cohorts, a non-trivial task.To achieve this, we developed CNSistent, a Python package for imputation, filtering, consistent segmentation, feature extraction, and visualization of cancer copy number profiles from heterogeneous datasets. We demonstrate the utility of CNSistent by applying it to the publicly available TCGA, PCAWG, and TRACERx cohorts. We compare different segmentation and aggregation strategies on cancer type and subtype classification tasks using deep convolutional neural networks. We demonstrate an increase in accuracy over training on individual cohorts and efficient transfer learning between cohorts. Using integrated gradients we investigate lung cancer classification results, highlighting SOX2 amplifications as the dominant copy number alteration in lung squamous cell carcinoma.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf104), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 3: Sampsa Hautaniemi

      Streck and Schwarz present a method, CNSintent, for consistent segmentation of copy-number data. The utility of the tool is demonstrated using three large cancer cohorts and a neural network classifier built upon the consistently segmented data. CNSintent can facilitate solving an important biomedical problem: the advanced analysis of copy-number data. The authors are lauded for their excellent Python code and thorough documentation. While the contribution is timely and likely important, there are several areas for improvement.

      The manuscript's readability could be better. There are typos, textual errors, and inconsistencies in figure captions, such as incorrect figure references or mismatched values between the text and figures. The "Consistent Segmentation" section is difficult to follow. It is unclear whether this step involves merging pre-existing breakpoints in the data to produce new, longer segments or if larger segments, such as whole chromosomes, are split into smaller, constant-sized segments. The writing suggests that segments are first merged and then split; however, later in the manuscript, they appear to be used separately. In our testing, combining these approaches did not yield meaningful results. Since consistent segmentation is the method's most critical step, we strongly suggest clarifying this section.

      The manuscript is unbalanced in its content, with excessive focus on the tool's application and the discoveries derived from it, rather than on the tool itself. This reduces the clarity of the key message. We recommend compressing the application section (deep learning in cancer classification) while expanding the tool description with additional explanations.

      It is also unclear what type of data the authors are using in the cancer classification section. To improve clarity, this information should be explicitly included in the methods section, detailing the sequencing strategy and copy-number tools used for each cohort.

      The methods section would benefit from a more detailed explanation of the CNSintent steps. Both Figure 1 and the text leave some parts unclear, particularly in the "Consistent Segmentation" section. Additionally, methods such as random forest and UMAP are only briefly mentioned in a supplementary figure rather than being described in the methods section. Moving these descriptions to the methods section would improve clarity.

      Figures are generally clear, but improving color differentiation would be beneficial. For example, in Figure 1, the dark red and dark orange shades are too similar, making them difficult to distinguish. A more optimized color scheme with slightly lighter tones (i.e., increased luminance) would enhance readability.

      The introduction promotes copy-number signatures; however, these signatures rely on segment lengths and unique breakpoints, which vary between samples. Since this method enforces consistent segmentation and breakpoints across all samples, its applicability to copy-number signatures is unclear. This should be discussed in the Discussion section or removed from the introduction.

      Out of curiosity: Is it possible to prioritize one type of segmentation over another? For instance, if both WGS and WES data are available, can CNSintent be configured to prioritize WGS calls? Similarly, some tools provide highly precise breakpoint calls that are valuable for detecting fusion genes or rearrangements. In such cases, it would be useful to prioritize these calls and harmonize results from other tools accordingly.

      Terminology Clarifications:

      Blacklist, blacklisted regions, gap regions, mask: These terms should be used consistently, particularly since blacklists can be applied at different processing stages. Notably, PCAWG blacklists samples, not regions. Segmentation: The term is commonly used in CNV analysis to refer to inferring continuous genomic segments from raw read counts or probe intensities. Here, it has a slightly different meaning—computing consistent breakpoints across all samples—so a more explicit definition would be helpful. Breakpoint merging/clustering: If these terms are synonymous, choosing one would improve readability. Coverage: Since "coverage" often refers to sequencing depth, a critical quality metric in DNA sequencing, it might be clearer to use "copy-number coverage" or a similar term. For example, the sentence "Next, samples with low coverage were removed using the…" could be ambiguous if read without context.

      At the end of the subsection "Explainability and the Effect of SOX2 Gene," the phrase "which exhibits significant local amplification in LUSC" should be revised to "which exhibits significant focal amplification in LUSC." The correct terminology is "focal" rather than "local," as established in Beroukhim et al. (2010).

    1. AbstractThe ability to differentiate between viable and dead microorganisms in metagenomic data is crucial for various microbial inferences, ranging from assessing ecosystem functions of environmental microbiomes to inferring the virulence of potential pathogens from metagenomic analysis. While established viability-resolved genomic approaches are labor-intensive as well as biased and lacking in sensitivity, we here introduce a new fully computational framework that leverages nanopore sequencing technology to assess microbial viability directly from freely available nanopore signal data. Our approach utilizes deep neural networks to learn features from such raw nanopore signal data that can distinguish DNA from viable and dead microorganisms in a controlled experimental setting of UV-induced Escherichia cell death. The application of explainable AI tools then allows us to pinpoint the signal patterns in the nanopore raw data that allow the model to make viability predictions at high accuracy. Using the model predictions as well as explainable AI, we show that our framework can be leveraged in a real-world application to estimate the viability of obligate intracellular Chlamydia, where traditional culture-based methods suffer from inherently high false negative rates. This application shows that our viability model captures predictive patterns in the nanopore signal that can be utilized to predict viability across taxonomic boundaries. We finally show the limits of our model’s generalizability through antibiotic exposure of a simple mock microbial community, where a new model specific to the killing method had to be trained to obtain accurate viability predictions. While the potential of our computational framework’s generalizability and applicability to metagenomic studies needs to be assessed in more detail, we here demonstrate for the first time the analysis of freely available nanopore signal data to infer the viability of microorganisms, with many potential applications in environmental, veterinary, and clinical settings.Author summary Metagenomics investigates the entirety of DNA isolated from an environment or a sample to holistically understand microbial diversity in terms of known and newly discovered microorganisms and their ecosystem functions. Unlike traditional culturing of microorganisms, genomic approaches are not able to differentiate between viable and dead microorganisms since DNA might persist under different environmental circumstances. The viability of microorganisms is, however, of importance when making inferences about a microorganism’s metabolic potential, a pathogen’s virulence, or an entire microbiome’s impact on its environment. As existing viability-resolved genomic approaches are labor-intensive, expensive, and lack sensitivity, we here investigate our hypothesis if freely available nanopore sequencing signal dat that captures DNA molecule information beyond the DNA sequence might be leveraged to infer such viability. This hypothesis assumes that DNA from dead microorganisms accumulates certain damage signatures that reflect microbial viability and can be read from nanopore signal data using fully computational frameworks. We here show first evidence that such a computational framework might be feasible by training a deep model on controlled experimental data to predict viability at high accuracy, exploring what the model has learned, and using it in a real-world application by application to a bacterial species of veterinary relevance. We finally show that a specific model has to be trained to accurately predict viability after antibiotic exposure of a mock microbial community. While the generalizability of our computational framework therefore needs to be assessed in much more detail, we here demonstrate that freely available data might be usable for relevant viability inferences in environmental, veterinary, and clinical settings.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf100), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 2: Jakob Wirbel

      Summary: Urel and colleagues present a novel computational method to predict viability from metagenomic sequencing data, using the Nanopore squiggle as input. The manuscript is well-written and present an interesting new application, bolstered in particular by the application of explainable AI. However, I have some concerns regarding the generalizability of their method, detailed below.

      Major: The way the authors try to exclude contamination in their C. abortus experiment is not optimal, since contaminatants might be at low abundance and therefore not assemble well (especially with the relatively low sequencing output overall). Instead, it would be better to map reads against the reference genome for C. abortus and check if reads predicted to be viable map or if they are unmapped in this test. Maybe viable reads instead map against a database of known contaminants, like skin-resident microbes or other known kit contaminants. (This could potentially bolster their model performance)

      The authors claim that their method generalizes well from E. coli to C. abortus, which were killed in two different ways (UV and heat shock). However, if I understood correctly, their extracted DNA was left in the lab for 5 days. During this time, could exposure to sunlight over time have led to similar chemical reactions (meaning twists/kinks in the DNA as well as pyrmidine dimers)? This might be a point to discuss or it could be easily tested by incubating the DNA of the heat-killed C. abortus in the dark.

      What is the time-frame of DNA degradation in which the model works best? The authors left the DNA for 5 days, but metagenomic samples are usually processed quite quickly. How would the model perform on samples that were only kept for 1 day after initial killing? At which time of incubation does the model not generalize anymore? For a potential application, it might be useful to know if DNA is viable or not, even if the cells died relatively recently (and in the dark).

      Code availability: The github looks great, but as a potential user of their method, I would not want to train my own model. Is it possible to host the model, maybe on Zenodo, so that it could be more useful as an application?

      Minor: Lines 96-100 read a bit like a Nanopore commercial and are not really relevant for this paper Line 182: shouldn't heat shock at 120 C inactivate enzymes? Line 206: it is curious to keep the default cutoff just because the results are fine. Why not optimize the F1 score, for example? Fig1B seems to indicate that a probability threshold of 0.48 or something would give a higher F1 score. The decision to keep the threshold at the default value seems arbitrary Line 275: interesting hypothesis. Did you observe quicker decay of pore viability in the dead versus the alive run? Could you provide the pore scan information over the time of the sequencing run as a supplement, maybe, to back up this hypothesis? Line 311: the number does not match the one in the table Line 331: the dead reads are very short. Could you compare just the length of the reads with the viability predictions? Are shorter reads more likely to be predicted to be non-viable? Fig 3a: what does normalized count mean? How about a standard histogram or density plot? Line 442: The most recent version of dorado is v0.8.2.; did you mean v0.4.2? Please adjust.

    2. AbstractThe ability to differentiate between viable and dead microorganisms in metagenomic data is crucial for various microbial inferences, ranging from assessing ecosystem functions of environmental microbiomes to inferring the virulence of potential pathogens from metagenomic analysis. While established viability-resolved genomic approaches are labor-intensive as well as biased and lacking in sensitivity, we here introduce a new fully computational framework that leverages nanopore sequencing technology to assess microbial viability directly from freely available nanopore signal data. Our approach utilizes deep neural networks to learn features from such raw nanopore signal data that can distinguish DNA from viable and dead microorganisms in a controlled experimental setting of UV-induced Escherichia cell death. The application of explainable AI tools then allows us to pinpoint the signal patterns in the nanopore raw data that allow the model to make viability predictions at high accuracy. Using the model predictions as well as explainable AI, we show that our framework can be leveraged in a real-world application to estimate the viability of obligate intracellular Chlamydia, where traditional culture-based methods suffer from inherently high false negative rates. This application shows that our viability model captures predictive patterns in the nanopore signal that can be utilized to predict viability across taxonomic boundaries. We finally show the limits of our model’s generalizability through antibiotic exposure of a simple mock microbial community, where a new model specific to the killing method had to be trained to obtain accurate viability predictions. While the potential of our computational framework’s generalizability and applicability to metagenomic studies needs to be assessed in more detail, we here demonstrate for the first time the analysis of freely available nanopore signal data to infer the viability of microorganisms, with many potential applications in environmental, veterinary, and clinical settings.Author summary Metagenomics investigates the entirety of DNA isolated from an environment or a sample to holistically understand microbial diversity in terms of known and newly discovered microorganisms and their ecosystem functions. Unlike traditional culturing of microorganisms, genomic approaches are not able to differentiate between viable and dead microorganisms since DNA might persist under different environmental circumstances. The viability of microorganisms is, however, of importance when making inferences about a microorganism’s metabolic potential, a pathogen’s virulence, or an entire microbiome’s impact on its environment. As existing viability-resolved genomic approaches are labor-intensive, expensive, and lack sensitivity, we here investigate our hypothesis if freely available nanopore sequencing signal dat that captures DNA molecule information beyond the DNA sequence might be leveraged to infer such viability. This hypothesis assumes that DNA from dead microorganisms accumulates certain damage signatures that reflect microbial viability and can be read from nanopore signal data using fully computational frameworks. We here show first evidence that such a computational framework might be feasible by training a deep model on controlled experimental data to predict viability at high accuracy, exploring what the model has learned, and using it in a real-world application by application to a bacterial species of veterinary relevance. We finally show that a specific model has to be trained to accurately predict viability after antibiotic exposure of a mock microbial community. While the generalizability of our computational framework therefore needs to be assessed in much more detail, we here demonstrate that freely available data might be usable for relevant viability inferences in environmental, veterinary, and clinical settings.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf100), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 1: Finlay Maguire

      In this paper the authors train a ResNet-based model to predict whether individual 10,000 sample chunks of nanopore signal data originate from live or killed bacterial isolate cultures. From live and UV-killed (at exponential phase) E. coli K-12 cultures DNA was extracted and sequenced using separate R10.4.1 flowcells on a MinION. Signal data from each read in the live and dead extractions were then processed by discarding the first 1,500 samples and dividing the remaining signals into 10,000 sample chunks. These were then split into a balanced 60:20:20 train, test, and validation datasets with the constraint that no two chunks from the same read would end up in the same dataset (e.g., chunk 1 and chunk 2 of 1st read in the killed culture would hypothetically be separated into train and test). During this they also explored/compared the impact of chunk size, model architecture, and performance of a sequence based model using the E. coli data. With a nicely performed class-activation map and masking approach they then identified the signal regions most strongly associated with dead-predictions (such as twisting/kinking/pore blockage of DNA around pyrimidine dimers). Finally, they applied their trained model to a live and heat-killed Chlamydia abortus culture and compared their results to stained microscopy and propidium monoazide PCR measures of viability. They found equivalent performance on the C. abortus data to their E. coli data (despite a different killing-method and taxa).

      The manuscript is well written and the methods are clearly described (including well documented code and deposited data). The authors explainability methodology is excellent although it would have been nice to see a bit more in-depth interpretation of those results. The authors have also presented a convincing case that nanopore signal data does contain information that can be used to distinguish signal chunks from live and dead bacterial monocultures. This methods has the potential to be useful in clinical and environmental genomics if it can be extended to more heterogeneous metagenomic samples. However, despite the title and framing of this manuscript (i.e., "metagenomics"), their analyses do not involve any metagenomic data and their results so far do not demonstrate if this is fesible. Currently, the overall framing (and title) of the manuscript is not appropriate given the work performed at this point. Similarly, given that both E. coli and C. abortus "dead" cultures resulted in median read length less than half the live cultures, the authors do not fully make the case that the signal and ResNet approach is actually required relative to simpler baseline models. Finally, although they did evaluate performance on a complete separate dataset, the authors should at least explore/quantify the correlation of live/dead prediction across chunks of the same read given the default expectation of non-independence of signal chunks from the same read.

      Major - Although the title and framing of the paper suggest that the authors are classifying live and dead bacteria in metagenomic datasets, the actual experiments and method developed are entirely based around sequencing of cultured clonal bacterial isolates. Metagenomic datasets are going to have considerably more heterogeneity in viability, species composition, and DNA signal characteristics. Given this, the paper's title, introduction, and parts of the discussion are a bit of an oversell and inappropriate. This manuscript should be revised to more clearly reflect the work actually performed.

      • This paper doesn't establish whether a ResNet + Signal approach actually outperforms a much simpler baseline. For example, given there is a clear extraction and median read-length differences between live and dead samples, it is possible that a much simpler logistic model using basic features such as read length and/or translocation could perform equivalently.

      • Although the C. abortus analysis demonstrates limited impact of leakage, I'm still a bit concerned that the potential non-independence of chunks from the same read (i.e., chunk 1 and chunk 3 of the same read are more likely to share similar live/dead signal characteristics than Chunk 1 and 3 of different reads). By not having multiple chunks of the same read in the training, validation, or test datasets the authors may have avoided issues with longer-reads being more represented in their datasets. However, this has the potential to introduce data leakage between train and test set (which may impact generalisability when they attempt to extend this method to metagenomics). I think this paper would be improved by some exploration of the correlation of live/dead prediction across chunks of the same read. How often do different chunks of the same read disagree? How does this impact the overall performance of the model? Does taking the average prediction across chunks of the same read improve or degrade performance? Would this problem be better suited to a multiple instance learning approach (i.e., a live/dead label applied to all chunks from a single read) especially in more heterogeneous datasets? To what degree do longer reads with more chunks contribute disproportionately to the overall performance in the C. abortus dataset?

      Minor

      • SRA records don't seem to be live yet (https://www.ncbi.nlm.nih.gov/sra?linkname=bioproject_sra_all&from_uid=1123127)

      • Are the actual pod5 files available?

      • Read-level performance should be analysed and reported.

      • Figure 1B: the test subplot numbers are almost too small to read - they may benefit from being its own panel.

      • Plot axes labels are not always clear (e.g., Figure 3) percentage of what? Chunks? or Reads? It would be nice to see consistent capitalisation of labels and legends.

      • Predictions on viable E. coli and viable C. abortus seems surprisingly similar (91.44% vs 91.34% viable and 8.56% vs 8.66% dead) despite different taxa, potentially underlying viable cell proportion, and output probability densities. This would benefit from further discussion/analysis - do misclassified chunks have any common characteristics? Would you expect the E. coli to have similar microscopy/PCR measured viability percentage as the C. abortus.

      • Would be good to see a bit more discussion/exploration of impact of mixed live/dead cells given ~37.6% viability measure in the C. abortus sample (e.g., how well do models perform with different ratios of live/dead reads) - could potentially be achieved using in-silico spike ins).

    1. Synthèse du webinaire : Le programme EVARS, un outil indispensable pour la protection des enfants

      Résumé

      Ce document de synthèse résume les points clés du webinaire organisé par la FCPE nationale le 23 septembre 2025, consacré au programme d'Éducation à la Vie Affective, Relationnelle et à la Sexualité (EVARS).

      Entré en vigueur à la rentrée 2025, ce programme vise à garantir l'application effective de la loi Aubry de 2001, qui rendait obligatoire trois séances annuelles d'éducation à la sexualité, mais qui n'était appliquée que pour 15 % des élèves en 2024.

      Les intervenants — Marc Pelletier du Ministère de l'Éducation nationale, Sarah Durocher du Planning familial et l'animateur Didier Valentin — ont unanimement présenté le programme comme un enjeu nécessaire et indispensable pour la protection de l'enfance.

      Il répond aux missions fondamentales de l'École : promouvoir l'égalité, lutter contre les discriminations, enseigner le consentement et prévenir toutes les formes de violence.

      Le programme est également une réponse directe aux défis contemporains auxquels la jeunesse est confrontée, notamment l'exposition précoce à la pornographie, le harcèlement et les violences sexistes et sexuelles.

      Élaboré suite à un vaste processus consultatif et validé par le Conseil d'État, le programme repose sur trois principes directeurs : l'unité thématique, la progressivité stricte des contenus adaptés à l'âge, et la complémentarité avec les autres enseignements. Il est obligatoire et les parents ne peuvent y soustraire leurs enfants.

      La mise en œuvre s'appuie sur une formation massive des personnels de l'Éducation nationale et, dans le second degré, sur des interventions complémentaires d'associations agréées, toujours dans le cadre de projets co-construits avec les équipes pédagogiques.

      Face aux campagnes de désinformation, les intervenants ont insisté sur la nécessité d'une communication claire auprès des familles pour dissiper les malentendus et réaffirmer que l'objectif n'est pas d'enseigner des pratiques sexuelles, mais de construire une culture du respect, de l'égalité et du bien-être.

      Contexte et Justification du Programme EVARS

      Un Impératif Légal et une Nécessité Sociale

      Le programme EVARS a été conçu pour répondre à un déficit majeur dans l'application de la législation française.

      Bien que la loi Aubry de 2001 ait rendu l'éducation à la sexualité obligatoire à raison de trois séances par an, un constat alarmant a été dressé en 2024 : seuls 15 % des élèves en avaient réellement bénéficié.

      L'objectif principal du nouveau programme est donc de garantir l'effectivité de cette loi sur tout le territoire.

      Marc Pelletier, de la Direction générale de l'enseignement scolaire (DGESCO), a souligné que l'EVARS s'inscrit pleinement dans les missions fondamentales que la Nation confie à l'École, telles que définies dans le Code de l'éducation :

      Promouvoir l'égalité, notamment entre les femmes et les hommes.

      Lutter contre toutes les formes de discrimination, y compris celles fondées sur le sexe, l'identité de genre ou l'orientation sexuelle.

      Éduquer au principe du consentement et au respect du corps humain.

      Prévenir toutes les formes de violence, en particulier les violences sexistes et sexuelles, et contribuer au repérage des situations de violences intrafamiliales, y compris l'inceste.

      Répondre aux Enjeux Contemporains de la Jeunesse

      Le programme a été jugé indispensable pour outiller les enfants et les adolescents face aux réalités et aux risques de leur époque. Plusieurs statistiques alarmantes ont été citées pour justifier son déploiement :

      Enjeu

      Donnée Clé

      Exposition à la pornographie

      23 millions de mineurs y sont exposés chaque mois.

      Agressions sexuelles

      Un enfant est victime toutes les trois minutes en France.

      Violences sexuelles sur mineurs

      80 % des victimes sont des filles.

      Harcèlement scolaire

      Concerne 5 % des écoliers, 6 % des collégiens et 4 % des lycéens qui se trouvent dans une situation de vulnérabilité.

      Inceste

      160 000 enfants en sont victimes en France.

      Pour Sarah Durocher, présidente du Planning familial, l'un des principaux leviers pour contrer la désinformation massive à laquelle les jeunes sont exposés via Internet est une éducation structurée et fiable dispensée à l'école.

      Le Soutien des Fédérations de Parents et des Associations

      La FCPE, organisatrice du webinaire, a exprimé son soutien "avec force et convictions" au programme.

      Pour la fédération, l'EVARS est essentiel pour informer, prévenir, construire une société plus égalitaire, libérer la parole, donner des repères clairs, apprendre à dire non et comprendre la notion de consentement.

      La FCPE fait également partie du Collectif pour une véritable éducation à la sexualité, aux côtés du Planning familial et d'autres organisations, afin de parler d'une même voix et de fournir des outils concrets aux familles et aux établissements pour contrer la désinformation.

      Élaboration, Contenu et Principes Directeurs

      Un Processus de Création Consultatif et Validé

      Le programme EVARS n'a pas été créé de manière arbitraire. Son élaboration a suivi un processus rigoureux et consultatif :

      1. Groupe de travail (2023) : Mis en place pour analyser les raisons de la faible application de la loi de 2001.

      2. Saisine du Conseil Supérieur des Programmes (CSP) : Le ministre Pap Ndiaye a mandaté le CSP pour élaborer un projet de programme, avec une attention particulière à la distinction entre le premier et le second degré.

      3. Consultations : La DGESCO a mené de larges consultations sur la base du projet du CSP, incluant des professionnels de l'éducation, des organisations syndicales, des partenaires institutionnels et une consultation publique.

      4. Adoption (Janvier 2025) : Le projet a été adopté à l'unanimité des votants au sein des instances consultatives.

      5. Validation Juridique (Juin 2025) : Le Conseil d'État a rejeté deux recours administratifs demandant son annulation, confirmant ainsi sa conformité légale et son caractère "neutre et objectif".

      Trois Principes Fondamentaux

      Le programme est structuré autour de trois principes essentiels pour garantir sa cohérence et son adéquation.

      1. Unité : À tous les niveaux, l'enseignement s'articule autour de trois questions structurantes :

      • ◦ Comment se connaître, vivre et grandir ?  
      • ◦ Comment rencontrer les autres, construire avec eux des relations respectueuses et s'y épanouir ?  
      • ◦ Comment trouver sa place dans la société, y être libre et responsable ?

      2. Progressivité : Le principe le plus fondamental est l'adaptation stricte des contenus et des modalités à l'âge et à la maturité des élèves. Le nom même du programme change pour marquer cette distinction :

      • Premier degré (école) : Éducation à la Vie Affective et Relationnelle (EVAR).  
      • Second degré (collège/lycée) : Éducation à la Vie Affective, Relationnelle et à la Sexualité (EVARS).

      Le mot "sexualité" n'apparaît dans le programme qu'à partir de la classe de quatrième.

      3. Complémentarité : Les trois séances annuelles forment un parcours cohérent.

      L'EVARS est conçu pour compléter les enseignements disciplinaires (SVT, Enseignement Moral et Civique) et les actions éducatives globales de l'établissement (ex: programme de lutte contre le harcèlement).

      Une Approche Progressive et Adaptée à Chaque Âge

      Niveau

      Dénomination

      Thèmes Abordés

      Maternelle

      EVAR

      Émotions, identification des parties du corps, notion d'intimité, reconnaissance des adultes de confiance.

      Élémentaire (CP-CM2)

      EVAR

      Sentiments, stéréotypes de sexe, lutte contre les discriminations, consentement (abordé sans forcément nommer le terme), dangers d'Internet, harcèlement.

      Collège

      EVARS

      Changements liés à la puberté, vie privée, respect de l'intimité, sentiments amoureux, respect des différences, prévention des violences (sexuelles, emprise).

      Lycée

      EVARS

      Engagement dans une relation, droit d'être soi, acceptation et pression sociales, construction de relations saines à soi et aux autres.

      Il est crucial de noter que le terme "sexualité" est entendu dans un sens global, incluant les dimensions psychologiques, affectives, juridiques et sociales, et non comme un cours sur les pratiques sexuelles.

      Mise en Œuvre Pratique et Pédagogie

      Le Rôle Central des Personnels de l'Éducation Nationale

      Un effort de formation massif est en cours pour accompagner les équipes. Cela inclut des séminaires nationaux, des formateurs académiques, et des parcours de formation en ligne ("parcours magister") accessibles à tous les professeurs.

      N'importe quel professeur volontaire peut animer ces séances, pas uniquement les enseignants de SVT.

      Les personnels de santé scolaire (infirmières, psychologues) sont des acteurs clés.

      Leur connaissance des élèves permet d'adapter les séances aux problématiques locales.

      Des protocoles clairs existent pour l'accueil de la parole des enfants en cas de révélation de violences, garantissant que l'enseignant n'est jamais seul face à ces situations.

      L'Intervention des Associations Agréées

      Le recours à des partenaires extérieurs est encadré :

      Recommandé dans le second degré : Les interventions d'associations sont encouragées au collège et au lycée pour leur expertise complémentaire.

      Non prioritaire dans le premier degré : Le ministère préconise que les séances soient menées par les professeurs des écoles, intégrées au quotidien de la classe.

      Conditions strictes :

      ◦ L'association doit être agréée par le Ministère, un label garantissant son respect des valeurs de la République et la pertinence de son approche pédagogique.    ◦ L'intervention doit s'inscrire dans un projet pédagogique co-construit avec l'équipe de l'établissement.    ◦ Un professionnel de l'établissement doit toujours être présent pendant la séance.

      Le Planning familial, qui intervient auprès de 3600 établissements, a précisé refuser autant de demandes qu'il en accepte, illustrant la forte demande du terrain.

      Déroulement Type d'une Séance : L'Approche de Didier Valentin

      Didier Valentin a illustré la pédagogie active et non-jugeante utilisée lors des séances.

      Philosophie : "N'essayons pas de convaincre, tentons de faire réfléchir." L'objectif est la réduction des risques et le développement de l'esprit critique.

      Focus sur le "Relationnel" : Une grande partie du travail porte sur la manière dont les jeunes interagissent, se parlent et vivent ensemble, bien avant d'aborder la sexualité.

      Outils interactifs : Les séances ne sont pas des cours magistraux. Elles s'appuient sur des outils participatifs qui partent du vécu des jeunes :

      • Exemple 1 : Un tableau où les élèves collent des post-it sur les "avantages et inconvénients" d'être une fille, un garçon ou une personne non-binaire, pour lancer un débat sur les stéréotypes et l'empowerment.  
      • Exemple 2 : Diffusion de courtes vidéos vues sur les réseaux sociaux (TikTok) pour lancer un débat contradictoire et analyser les discours (ex: masculinistes).

      Questions des Parents et Lutte Contre la Désinformation

      Cadre Réglementaire et Communication

      Caractère obligatoire : Il a été rappelé que l'EVARS est un enseignement obligatoire. Un parent ne peut pas demander une dispense pour son enfant.

      Information des familles : Le Ministère recommande fortement que les établissements communiquent de manière transparente sur les objectifs du programme, par exemple lors des réunions de rentrée, afin de "dissiper les malentendus".

      Rôle des parents d'élèves : Les représentants des parents ont un rôle à jouer dans les instances comme le Comité d'Éducation à la Santé, à la Citoyenneté et à l'Environnement (CESCE) pour participer à l'élaboration du projet d'établissement.

      Répondre aux Inquiétudes et aux "Infox"

      Les intervenants ont reconnu l'existence d'une "panique morale" et de campagnes de désinformation actives. Sarah Durocher a mentionné que certains groupes tentent de se faire élire comme représentants de parents d'élèves dans le but de faire barrage au programme.

      Pour rassurer les familles, plusieurs points ont été martelés :

      Formation des intervenants : Les professionnels des associations sont formés (ex: 160 à 400 heures pour le Planning familial) et leur casier judiciaire est vérifié.

      Développement des compétences psycho-sociales : Le programme vise à renforcer les compétences émotionnelles, cognitives et relationnelles des élèves, qui sont des vecteurs de réussite scolaire et de bien-être.

      Une éducation féministe pour tous : Didier Valentin a résumé l'objectif comme une "éducation féministe" visant à déconstruire les stéréotypes de genre pour créer des relations plus égalitaires et, in fine, faire baisser les violences.

    1. Synthèse de l'Audition sur le Service Civique

      Résumé

      L'audition de la présidente de l'Agence du service civique met en lumière la dualité d'un dispositif de 15 ans, largement salué comme un "vrai succès" par la Cour des Comptes et plébiscité par les jeunes et les structures d'accueil, mais aujourd'hui menacé par des restrictions budgétaires drastiques.

      Avec plus de 868 000 participants depuis sa création, le Service Civique s'est imposé comme un outil majeur de cohésion sociale, de mixité et un tremplin d'insertion pour la jeunesse.

      Cependant, l'annulation de crédits pour 2025 réduit la cible de 150 000 à 135 000 jeunes, supprimant de fait 15 000 missions et fragilisant un écosystème associatif déjà sous tension.

      Les débats ont révélé un large consensus sur la pertinence du dispositif, mais aussi des inquiétudes profondes concernant son financement, les risques de substitution à l'emploi, les allégations de dévoiement idéologique et la tension structurelle entre sa vocation d'engagement citoyen et son rôle de facto dans l'insertion professionnelle.

      1. Le Service Civique : Bilan et Impact en Chiffres

      Créé par la loi du 10 mars 2010, le Service Civique est un dispositif d'engagement volontaire qui a démontré un impact significatif en 15 ans d'existence.

      Fondamentaux du Dispositif

      Public : Jeunes de 16 à 25 ans (jusqu'à 30 ans pour les jeunes en situation de handicap).

      Mission : Mission d'intérêt général auprès d'associations ou d'institutions publiques.

      Durée : Environ 6 mois, avec un maximum de 12 mois.

      Intensité : En 2023, la durée moyenne était de 7 mois avec une intensité hebdomadaire de 27 heures.

      Indemnisation : 620 € par mois.

      Bénéfices : Accompagnement, formation civique et citoyenne (incluant les premiers secours), couverture sociale complète et validation de trimestres de retraite de base.

      Bilan Quantitatif

      Total de participants : 868 000 jeunes ont réalisé une mission depuis 2010.

      Missions à l'étranger : 15 000 jeunes ont effectué leur mission à l'international.

      Volume annuel : Près de 90 000 nouvelles missions ont été engagées en 2023.

      Pour 2024, le chiffre s'élève à 86 431 entrées en mission, correspondant à l'atteinte de la cible annuelle (avant réduction) de 150 000 jeunes en service civique sur l'année.

      Taux d'occupation : 100 % des places disponibles sont occupées depuis 2023.

      Profil des Volontaires et Structures d'Accueil

      Le dispositif se caractérise par une forte mixité sociale et de parcours.

      Catégorie

      Données Clés

      Profil à l'entrée

      1/3 étudiants, 1/3 demandeurs d'emploi, 1/3 inactifs.

      Publics spécifiques

      3,3 % de jeunes en situation de handicap.

      14 % de jeunes issus des quartiers prioritaires de la ville (QPV).

      31 % de jeunes issus de la ruralité.

      Structures d'accueil

      62 % en associations.

      28 % dans l'État et ses opérateurs (ex: Ministère de l'Éducation Nationale).

      9 000 organismes d'accueil différents au total.

      Taux de Satisfaction et Impact

      Le Service Civique est un dispositif très connu et apprécié, tant par les volontaires que par les recruteurs.

      Notoriété : Plus de 9 jeunes sur 10 connaissent le dispositif.

      Satisfaction des volontaires : 85 % des jeunes ayant effectué une mission se déclarent satisfaits.

      Satisfaction des recruteurs : Près de 70 % portent un avis favorable.

      Impact sur le parcours :

      Professionnel : 73 % des jeunes déclarent avoir mobilisé leur expérience pour leur parcours professionnel un an après leur sortie.    ◦ Orientation : 63 % l'ont utilisée pour leur orientation ou réorientation.    ◦ Insertion : 80 % des jeunes sont en emploi ou en formation 6 mois après la fin de leur mission.

      Impact sur l'engagement : 56 % des jeunes poursuivent une activité bénévole après leur mission, contre 36 % avant d'y entrer.

      2. La Crise Budgétaire : Un Tournant pour le Dispositif

      La principale menace pesant sur le Service Civique est d'ordre budgétaire, remettant en cause le consensus politique et la trajectoire de croissance du dispositif.

      La Cible Historique de 150 000 Jeunes

      Depuis 2017, un consensus national s'est établi autour d'une cible de 150 000 jeunes en service civique sur l'année, ce qui correspond à environ 85 000 nouvelles entrées en mission par an, soit un peu plus de 10 % d'une classe d'âge.

      La loi de finances initiale pour 2025 prévoyait les moyens nécessaires pour atteindre cet objectif.

      L'Impact des Annulations de Crédits

      Annulation pour 2024 : Plus de 70 millions d'euros ont été annulés.

      Décret d'annulation pour 2025 : Un décret a ramené la cible à 135 000 jeunes sur l'année, supprimant de fait 15 000 missions.

      Conséquences sur la trésorerie : La trésorerie de l'Agence a été réduite d'une norme prudentielle d'un mois à 15 jours, puis à une hypothèse de 6 jours (9 millions d'euros) pour 2025.

      Gel supplémentaire ("surgel") : Un surgel a été appliqué, dont le dégel partiel est espéré par la ministre.

      Conséquences sur l'Écosystème

      La réduction du nombre de missions a un double effet :

      1. Pour les jeunes : 15 000 jeunes seront privés de cette opportunité, alors que la demande est déjà très forte (3 candidatures enregistrées pour 1 mission disponible).

      2. Pour les associations : Cette réduction fragilise le tissu associatif, qui accueille la majorité des volontaires et dépend de leur contribution.

      Plusieurs intervenants ont souligné que les associations, déjà confrontées à des baisses de subventions, verront leur capacité d'action et d'accueil diminuée.

      3. Thèmes Stratégiques et Initiatives Clés

      Malgré les difficultés budgétaires, l'Agence du service civique développe des axes stratégiques pour répondre aux priorités nationales et aux aspirations de la jeunesse.

      Les Nouvelles Priorités Thématiques

      Service Civique Écologique : Lancé en avril 2024 avec un objectif de 50 000 missions d'ici 2027. La première étape de 1 000 missions supplémentaires a été dépassée, témoignant d'un "réel engouement" de la part des jeunes et de l'écosystème.

      Service Civique Solidarité Senior : Développé dans le cadre du plan "bien vieillir" pour répondre aux enjeux de société liés au vieillissement.

      Lutte contre le harcèlement scolaire : 1 000 missions ont été dédiées à la prévention et à la lutte contre ce fléau en milieu scolaire, un exemple jugé "archétypal" d'une mission réussie où les jeunes complètent l'action des agents publics sans s'y substituer.

      Le Lien avec le Service National Universel (SNU)

      L'abandon de la généralisation du SNU a eu un impact. Il était anticipé qu'une généralisation aurait massivement augmenté la demande de Service Civique, portant la cible théorique à 25 % d'une classe d'âge.

      L'abandon de ce projet évite une amplification de la tension actuelle entre l'offre et la demande, mais la question du décalage reste "posée de manière cruelle".

      Le Déploiement dans les Collectivités Territoriales

      Le développement du Service Civique s'est historiquement appuyé sur des partenariats avec de grandes associations nationales.

      Le déploiement dans les collectivités territoriales reste un axe de progression : seules 192 intercommunalités sur 1254 disposent d'un agrément.

      Un travail a été engagé avec Intercommunalité de France pour faciliter l'accueil de volontaires au niveau local, notamment dans les petites communes.

      4. Controverses et Préoccupations Soulevées

      L'audition a été l'occasion pour les députés d'exprimer plusieurs critiques et inquiétudes majeures concernant le fonctionnement et la finalité du dispositif.

      Le Risque de Substitution à l'Emploi

      Préoccupation : Des députés (notamment du groupe Écologiste) craignent que le Service Civique ne soit utilisé pour remplacer de "vrais emplois", notamment dans les services publics (ex: missions d'accueil).

      Réponse de l'Agence : C'est une "préoccupation constante" et essentielle. Le Code du service national l'interdit. L'Agence contrôle en amont (agrément) et en aval (signalements). La présidente note que le risque de substitution est plus élevé dans le secteur sportif associatif que dans les services publics, où la satisfaction des jeunes est par ailleurs plus élevée.

      Allégations de Dévoiement et Questions de Neutralité

      Préoccupation : Le Rassemblement National, s'appuyant sur un article du Journal du Dimanche, a soulevé le risque de "dévoiement" du dispositif au profit de "structures exclusivement tournées vers l'aide aux migrants" ou d'"écoles privées musulmanes", questionnant le respect de la neutralité républicaine.

      Réponse de l'Agence : La présidente a fermement réfuté ces allégations, qualifiant l'article de "mal documenté". Elle précise que l'association La SIMAD n'a accueilli que deux volontaires depuis 2020 et que l'association La Plume Bleue n'en a jamais accueilli. Elle a rappelé que l'Agence travaille avec les cellules préfectorales de lutte contre l'islamisme radical (CLIR) pour renforcer les contrôles.

      Un Outil d'Insertion Professionnelle plutôt que d'Engagement Citoyen ?

      Préoccupation : Un député (groupe UDR) a avancé que le dispositif s'est transformé en "simple contrat jeune", servant davantage l'insertion professionnelle que l'engagement citoyen.

      Il s'appuie sur une étude de l'INJEP montrant une corrélation entre le taux de chômage des jeunes et le recours au Service Civique, ainsi que sur les fortes disparités territoriales (27,4 % de participation dans les DROM contre 9,5 % dans l'Hexagone).

      Réponse de l'Agence : La présidente reconnaît que les motivations professionnelles sont une évidence et que le dispositif est un "tremplin vers l'emploi".

      Elle insiste cependant sur le fait qu'il s'agit d'une expérience allant au-delà d'un "simple contrat", car elle offre une "expérience concrète des valeurs de la République" et vise à "humaniser le service public".

      Inclusivité et Accessibilité

      Préoccupation : Le faible taux de participation des jeunes en situation de handicap (3,3 %) a été souligné (groupe Liot).

      Réponse de l'Agence : Ce chiffre est jugé "certainement insuffisant" mais en progression (+1,5 point en 4 ans).

      La principale réponse pour améliorer l'accessibilité de tous les publics est de développer une offre "d'ultra-proximité" sur tout le territoire, afin de ne pas rendre un déménagement nécessaire.

      5. Citations Marquantes

      Sur le succès et la menace (Présidente de la commission) : "La Cour des comptes a souligné, je cite, que le service civique est un vrai succès malgré quelques fragilités.

      Ce constat est donc favorable aujourd'hui et menacé par certaines interrogations pour ne pas dire inquiétude sur le devenir de ce dispositif."

      Sur l'essence du dispositif (Priska Tevenot, Ensemble) : "S'engager et apprendre de soi, c'est ce qui distingue le volontariat en service civique du simple job étudiant. [...] Le service civique, c'est une école de l'engagement, une école de la vie."

      Sur la rigueur budgétaire (Florence Joubert, Rassemblement National) : "Ce dispositif mérite d'être soutenu à condition qu'il ne soit pas dévoyé.

      Car nous parlons tout de même d'un financement public de près de 600 millions d'euros par an."

      Sur la substitution à l'emploi (Sophie Tailler Paulian, Écologiste) : "Comment éviter que le service civique ne vienne finalement remplacer de vrais emplois et ne soit pas finalement aussi une sorte de sas [...] avant d'entrer dans un vrai emploi ?"

      Sur le sacrifice du Service Civique (Florence Erouin Léotet, Socialiste) : "C'est pourtant pour tenter de sauver ce dispositif [le SNU] en échec que l'on choisirait de sacrifier le service civique, un outil d'émancipation et de fraternité républicaine."

      Sur la confusion des genres (Maxime Michelet, UDR) : "Le service civique semble être parfois davantage un outil d'insertion professionnelle que d'engagement citoyen."

      Sur la finalité du dispositif (Présidente de l'Agence) : "La promesse [du Service Civique] n'est autre encore une fois que de faire l'expérience de l'intérêt général et de la cohésion républicaine, de la mixité sociale. Donc c'est une promesse effectivement supérieure à celle d'un simple contrat jeune."

      Sur la valeur ajoutée (Présidente de l'Agence) : "Il [le Service Civique] ne se substitue pas à l'emploi, aux agents publics, mais il humanise le service public. [...]

      C'est vraiment un des moteurs qui fait la différence entre l'engagement de service civique et une simple expérience professionnelle."

    1. Synthèse de la Mission Flash sur l'Accompagnement à l'Orientation des Élèves

      Synthèse

      Ce document de synthèse présente les conclusions de la mission flash sur l'évaluation de l'accompagnement des élèves à la découverte des métiers et à l'orientation, menée par les rapporteurs Arnaud Bonet et Laurent Croisier.

      Après quatre mois de travaux et plus de 24 auditions, le rapport dresse le constat d'un système d'orientation perçu comme un "chantier perpétuel" et un "chemin escarpé", source d'angoisse pour les élèves, les familles et les équipes éducatives, en raison de l'absence d'une stratégie nationale claire et de la succession de réformes.

      Les conclusions s'articulent autour de cinq axes majeurs :

      1. Un parcours d'orientation continu : L'orientation doit être un processus de long terme, débutant dès l'école primaire pour déconstruire les stéréotypes et s'étendant tout au long de la scolarité, en impliquant étroitement les familles.

      2. Un accompagnement individualisé : La mise en place d'un référent orientation issu du corps enseignant dans chaque établissement est jugée indispensable, tout comme la création d'un droit effectif à la réorientation et la valorisation des compétences non académiques.

      3. La lutte contre les inégalités : Le rapport souligne que l'orientation reste fortement déterminée socialement et propose des mesures pour combattre l'autocensure, revaloriser la voie professionnelle et mieux accompagner les élèves en situation de handicap et ultramarins.

      4. La mobilisation des moyens : Des investissements significatifs sont nécessaires, notamment pour la formation certifiante des enseignants, le financement d'heures dédiées à l'orientation et la révision de la carte des Centres d'Information et d'Orientation (CIO).

      5. Une coordination renforcée des acteurs : Face aux tensions et à la confusion nées du partage de compétences entre l'État et les Régions depuis 2018, le rapport préconise une clarification des rôles et une meilleure articulation des actions pour offrir un parcours plus cohérent aux élèves.

      Au total, 45 pistes d'amélioration sont proposées pour transformer l'orientation d'un parcours subi en un levier d'égalité des chances et d'émancipation, permettant à chaque jeune de construire un avenir choisi.

      Analyse Détaillée des Conclusions du Rapport

      1. Constat Général : Un Parcours d'Orientation Fragmenté et Anxiogène

      Les rapporteurs ouvrent leur analyse en qualifiant l'orientation de "chantier perpétuel" et de "chemin escarpé et redouté".

      Ce système est marqué par une succession de réformes qui, faute d'une véritable stratégie nationale, ont abouti à une fragmentation des actions.

      L'orientation est trop souvent vécue comme une série de décisions ponctuelles et anxiogènes plutôt que comme un processus continu et réfléchi.

      2. Axe 1 : Pour un Continuum d'Orientation de l'École Primaire au Lycée

      Pour remédier à cette fragmentation, le rapport insiste sur la nécessité de concevoir l'orientation comme un processus s'inscrivant dans la durée.

      Découverte des métiers dès le primaire : Il est proposé d'anticiper la démarche de découverte des métiers dès l'école primaire.

      L'objectif n'est pas d'orienter précocement les élèves, mais d'élargir leurs horizons et de "déconstruire les représentations conduisant à l'autocensure", car "la construction des stéréotypes n'attend pas la classe de 5e".

      Implication des familles : Considérant que les parents sont les "premiers prescripteurs de l'orientation", le rapport préconise d'instaurer un dialogue régulier entre les familles et les équipes éducatives, avec un premier temps d'échange formel dès la classe de 5e.

      Transparence de l'information :

      ◦ Face à une information abondante mais parfois "paralysante", le rôle de l'ONISEP comme acteur de référence est salué.

      La nouvelle plateforme "Avenir(s)", déployée depuis décembre 2023, a vocation à devenir l'outil central pour l'accompagnement de la 5e à la terminale.

      Son adoption reste cependant un défi, avec 86 000 élèves connectés au 30 mai 2024, pour un objectif initial de 200 000.    *  ◦ Une alerte est lancée sur les intitulés des diplômes et des formations, jugés souvent sources de confusion.

      Parcoursup : La plateforme est décrite comme "complexe, opaque et anxiogène". Les rapporteurs recommandent :

      • ◦ D'inscrire dans la loi l'obligation de transparence des algorithmes (déjà publics).  

      • ◦ De rendre publics et clairement formulés les critères de sélection des commissions de vœux.  

      • ◦ L'un des rapporteurs recommande de "rechercher une alternative crédible à Parcoursup" pour garantir un accueil inconditionnel dans les filières universitaires non sélectives.

      Réforme des stages :

      • ◦ Pour le stage de 3e, il est proposé de permettre de le scinder en plusieurs expériences courtes pour découvrir un panel de métiers plus varié et lutter contre la reproduction des inégalités sociales.  

      • ◦ Pour le stage de 2de, il est proposé de supprimer son caractère obligatoire pour en faire un "espace de découverte et d'approfondissement d'un projet personnel".  

      • ◦ La diffusion du "job shadowing" (suivi d'un professionnel pendant une journée) est également recommandée.

      3. Axe 2 : La Nécessité d'un Accompagnement Personnalisé

      L'aide individualisée à l'orientation, bien que prévue dans les textes, n'est pas toujours effective.

      Trois pistes sont avancées :

      Un référent orientation dans chaque établissement : La nomination d'un "référent pour l'orientation et la découverte des métiers" est préconisée dans chaque établissement, y compris dans les lycées généraux et technologiques.

      Ce rôle devrait être confié à un personnel enseignant, et non à un psychologue de l'Éducation nationale (Psy-EN), pour plusieurs raisons :

      • ◦ Les enseignants sont au contact quotidien de l'ensemble des élèves.  

      • ◦ Les Psy-EN sont en nombre insuffisant (ratio estimé à 1 pour 1200 à 1300 élèves).    ◦

      Les Psy-EN partagent leur temps entre plusieurs établissements et leurs missions sont désormais majoritairement centrées sur le suivi psychologique.

      Un droit effectif à la réorientation : Les parcours scolaires sont jugés "trop rigides".

      Le rapport appelle à un "véritable droit à la réorientation", perçu non comme un échec mais comme une opportunité, en créant des passerelles effectives entre les différentes voies.

      Valorisation des compétences non académiques : Le rapport insiste sur la nécessité de repérer et de mettre en valeur les compétences et ressources des élèves, y compris ceux en difficulté scolaire.

      4. Axe 3 : Lutter Contre les Déterminismes et les Inégalités

      L'orientation scolaire reste "très largement socialement déterminée". Le rapport cible cinq champs d'action :

      Combattre l'autocensure : Encourager les mécanismes d'inspiration par les pairs ("rôles modèles") en mobilisant d'anciens élèves, des étudiants ou de jeunes professionnels.

      Impliquer toutes les familles : Organiser des événements sur l'orientation dans des tiers-lieux (maisons de quartier, mairies) pour toucher les familles les plus éloignées de l'école.

      Revaloriser la voie professionnelle : Pour lutter contre la perception de la voie professionnelle comme un "choix par défaut" et une "orientation subie", il est proposé d'inciter à la création de lycées polyvalents et d'expérimenter des classes mixtes en seconde (générale, technologique et professionnelle) autour d'un tronc commun.

      Élèves en situation de handicap :

      • ◦ Garantir un accès prioritaire à l'internat. 
      • ◦ Automatiser la transmission des informations sur les aménagements de scolarité entre établissements (avec accord de la famille).

      Néobacheliers ultramarins :

      • ◦ Augmenter le montant de l'aide "Parcours" (actuellement 500 €).  
      • ◦ Rehausser le plafond fiscal (actuellement environ 27 000 €) du "Passeport pour la mobilité des études".

      5. Axe 4 : Moyens Humains et Budgétaires à Mobiliser

      L'atteinte des objectifs nécessite des moyens concrets.

      Formation des personnels : Mettre en place une formation obligatoire et certifiante à l'orientation pour les enseignants, tant en formation initiale (INSPÉ) que continue.

      Financement des heures dédiées : Les volumes horaires prévus (12h en 4e, 36h en 3e, 54h au lycée) sont souvent indicatifs et non financés.

      Le rapport demande que ces heures soient intégrées à l'emploi du temps et que le référent orientation bénéficie d'une décharge horaire sur ses obligations de service, plutôt qu'une simple indemnité via le "Pacte enseignant".

      Rôle des Psy-EN et carte des CIO :

      • ◦ Mettre à jour le Code de l'éducation qui mentionne encore les "conseillers d'orientation-psychologues", un corps abrogé en 2017.  
      • ◦ Formaliser par convention la mission d'appui des Psy-EN aux enseignants.   
      • ◦ Revoir la carte des 411 CIO, dont le nombre a été réduit d'un quart en dix ans, afin de garantir qu'aucun élève ne soit à plus de 45 minutes en transport en commun d'un centre.

      6. Axe 5 : Améliorer la Coordination entre les Acteurs

      La loi de 2018 confiant l'information sur les métiers aux Régions a créé une source de "confusion" et de "tension" avec l'État, responsable du conseil.

      Un partage de compétences flou : Un consensus se dégage sur la nécessité de clarifier les missions de chacun, sans pour autant opérer un nouveau transfert de compétences vers les Régions.

      Une offre régionale méconnue : L'action des Régions est mal connue des établissements.

      Selon la Cour des comptes (2022), seuls 22 % des établissements déclarent avoir recours aux ressources régionales documentaires et 12 % aux dispositifs régionaux.

      Des outils de coordination inopérants : Le programme annuel d'orientation, qui doit articuler les actions de la Région et le projet de l'établissement, n'est que très rarement mis en place.

      Recommandations de coordination :

      • ◦ Améliorer la communication sur l'offre de services des Régions.  
      • ◦ S'assurer de la mise en place du programme annuel d'orientation dans chaque établissement.  
      • ◦ Cartographier les actions régionales pour identifier les zones non couvertes.  
      • ◦ Garantir que la plateforme "Avenir(s)" de l'ONISEP valorise les informations régionales pour éviter la concurrence.
    1. AbstractWater buffalo is a cornerstone livestock species in many low- and middle-income countries, yet major gaps persist in its genomic characterization—complicated by the divergent karyotypes of its two sub-species (swamp and river). Such genomic complexity makes water buffalo a particularly good candidate for the use of graph genomics, which can capture variation missed by linear reference approaches. However, the utility of this approach to improve water buffalo has been largely unexplored.We present a comprehensive pangenome that integrates four newly generated, highly contiguous assemblies of Pakistani river buffalo with available assemblies from both sub- species. This doubles the number of accessible high-quality river buffalo genomes and provides the most contiguous assemblies for the sub-species to date. Using the pangenome to assay variation across 711 global samples, we uncovered extensive genomic diversity, including thousands of large structural variants absent from the reference genome, spanning over 140 Mb of additional sequence. We demonstrate the utility of these data by identifying putative functional indels and structural variants linked to selective sweeps in key genes involved in productivity and immune response across 26 populations.This study represents one of the first successful applications of graph genomics in water buffalo and offers valuable insights into how integrating assemblies can transform analyses of water buffalo and other species with complex evolutionary histories. We anticipate that these assemblies, and the pangenome and putative functional structural variants we have released, will accelerate efforts to unlock water buffalo’s genetic potential, improving productivity and resilience in this economically important species.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf099), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 4: Wai Yee Low

      Review of "A comprehensive water buffalo pangenome reveals extensive structural variation linked to population specific signatures of selection". This is an impressive work at the frontier of buffalo genomics. I truly enjoy reading the work and my questions/comments are aimed at improving it further. My detailed comments are below: Line 30: I think it is better you include the actual number of publicly available assemblies used to create the pangenome graph. Line 71: There is now a swamp buffalo reference genome with annotation too (NCBI accession: PCC_UOA_SB_1v2). Perhaps consider to cite the swamp buffalo ref https://academic.oup.com/gigascience/article/doi/10.1093/gigascience/giae053/7753516 and rewrite the sentence to say a pangenome can be used for both swamp and river, but a single linear ref from either subspecies for read mapping is not good enough. Line 79: "highlighted" Line 82: What do you mean by "higher quality"? The assemblies have been discussed in this review: https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2021.629861/full Line 105: Technically, the graph method for bovine species, which includes water buffalo, is being investigated by the Bovine Pangenome Consortium (BPC). However, nothing useful has been published on the buffalo graph but perhaps consider citing the BPC since your paper overlaps with it (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-023-02975-0). Line 165: It will be good if you add a bit more context of the PanGenie method here as the researchers in buffalo community are not used to this. Additionally, it will be great if all code is made available on GitHub or as Supplementary Info. Line 170: To produce phase pangenome graph, don't you need all input assemblies to be phased? All are input assemblies phased? The UOA_WB_1 is locally phased, not phased throughout the genome. Line 235: "a list of 403 unrelated individuals." What does this translate to in terms that geneticists can understand? Do you mean siblings have been removed? Or individuals sharing the same grandparents were removed? Line 246: Can you please explain how did you get the coordinates to match between the GATK and PanGenie method? You'll need matching coordinates for concordance analysis. As I understand it, the GATK was based on UOA_WB_1? Line 254: Why these 3 chromosomes? Line 257: If you had not filtered for relatedness, how will it impact the selective sweep work? I think including some context will help the readers. Line 259: do you mean at least six samples per group? If yes, is 6 samples enough? Line 261: genotype quality less than 25 according to bcftools? Since you only used biallelic variants, please provide the breakdown between biallelic and multiallelic. Line 281: "… we first PacBio HiFi sequenced one female" Please rewrite this. Line 282: How common are these two breeds in percentage? Line 291: Is this already known? Perhaps cite the literature to show the agreement with previous studies? Fig 1D: This is a bit too small to see especially the SV distribution at the bottom. I can hardly see the median? Line 310: Why did you choose UOA_WB_1 as the reference? Line 311: the ~32.8 mil variants are comprised of SNPs as well? Fig 2: This is probably a panel of a figure but should not be the entire figure. The size of the circle indicates sample size but there should be a legend on the plot for this to say the sizes, right? Darker colour should be used to highlight the countries with samples instead of white? Maybe this could be a Supp figure too. Line 356: S Figure 4 and 5 should be main figures? You will need to annotate the abbreviation of sample-country in the legend of S Figure 5. Line 360: "To enable reuse we have made this dataset available …" The dataset should be made available to reviewers? Line 368: "76% of SNVs were called by both callers" 76% seem low. Also, called does not mean concordant. What is the concordance among called SNVs in both? Did the pangenome approach called most of the variants found in GATK? If not, what might be the reasons? Fig 3B: It is not immediately clear what the difference is, between non repetitive and repetitive regions. The overlapping text in the x-axes makes it hard to read. Line 390: "Analyses such as the study of selective sweeps or genome-wide association studies where low frequency variants are often filtered out will benefit less from the advantages of GATK, particularly given its longer run time." From here on, in this paragraph, it's Discussion, not Results. Line 418: Why human? Could you use cattle? Line 427: I tried the browser and not sure what I can learn from it. It will be helpful if there is a README with some examples on what can be explored. Line 450: How large before you considered it as larger variant? Is this ability to study larger variants still hold despite using only ~10 assemblies in the graph? The use of short reads for selective sweep study will still benefit from being able to incorporate these larger variants? As I understand it, the larger variants were found only from graph, not from the short reads. As such, the selective sweep may not be associated with any larger variants? Line 470: Fig S8 should be a main figure? Line 513: Instead of uniprot link, perhaps consider including this as Supplementary info or text. The info in the link may change in the future. Line 551: However, without scaffolding, the assemblies of Pakistani river buffalo may not be good enough to function as reference genomes for river buffalo? Line 552: When considering new bases, did you do this for each assembly independently or the new bases were discovered cumulatively? Line 581: Some of my questions at Line 450 can be discussed here. Line 586: Perhaps consider discussing the limitations of the small number of assemblies used to create the graph. As such, many SVs are likely still missing and we are still unable to properly assess allele frequency of these larger SVs. Additionally, while some SVs may not be considered as large in this work, it does not mean they have no impact.

    2. AbstractWater buffalo is a cornerstone livestock species in many low- and middle-income countries, yet major gaps persist in its genomic characterization—complicated by the divergent karyotypes of its two sub-species (swamp and river). Such genomic complexity makes water buffalo a particularly good candidate for the use of graph genomics, which can capture variation missed by linear reference approaches. However, the utility of this approach to improve water buffalo has been largely unexplored.We present a comprehensive pangenome that integrates four newly generated, highly contiguous assemblies of Pakistani river buffalo with available assemblies from both sub- species. This doubles the number of accessible high-quality river buffalo genomes and provides the most contiguous assemblies for the sub-species to date. Using the pangenome to assay variation across 711 global samples, we uncovered extensive genomic diversity, including thousands of large structural variants absent from the reference genome, spanning over 140 Mb of additional sequence. We demonstrate the utility of these data by identifying putative functional indels and structural variants linked to selective sweeps in key genes involved in productivity and immune response across 26 populations.This study represents one of the first successful applications of graph genomics in water buffalo and offers valuable insights into how integrating assemblies can transform analyses of water buffalo and other species with complex evolutionary histories. We anticipate that these assemblies, and the pangenome and putative functional structural variants we have released, will accelerate efforts to unlock water buffalo’s genetic potential, improving productivity and resilience in this economically important species.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf099), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 3: Laura Caquelin

      1. SummaryoftheStudy This study used graph genomics to better characterize water buffalo genomes. By building a pangenome from new and existing assemblies, the authors analyzed 711 samples. These samples revealed structural variation. These results highlight the value of graph genomics. This method

      2. Scopeofreproducibility According to our assessment the primary objective is: to identify genomic variants within selective sweep regions in the water buffalo genome.

      3. Outcome: Enrichment of high-impact structural variants (SVs), insertions/deletions (indels) and single nucleotide variants (SNVs) in selective sweep regions.
      4. Analysis method outcome: Variants were compared between selective sweep regions and genome-wide. Fisher's exact test was used to assess enrichment of functional variants.
      5. Main result: "Prior to annotation, multiallelic variants were normalized by splitting them into separate biallelic entries, resulting in 6,159,686 indels, 28,669,966 SNVs, and 160,921 SVs entries. Within putative selective sweep regions we identified 208,862 indels, 997,500 SNVs and 6,748 SVs. Notably an enrichment of HIGH impact SVs, indels and SNVs were observed within selective sweep regions (Figure 5A, Supplementary Table S6), with 50-80% more variants in these areas having a HIGH impact compared to genome-wide. Among the high impact variants in selective sweep regions only 20% were SNVs, with the remainder being SVs and indels, suggesting high impact larger variants may underlie putative selective sweeps." (Lines 453 to 461)

      6. AvailabilityofMaterials a. Data

      7. Data availability: Open
      8. Data completeness: Complete, all data necessary to reproduce main results are available
      9. Access Method: Supplementary files - Repository: -
      10. Data quality: Structured b. Code
      11. Code availability: Shared for the review after request - Programming Language(s): R
      12. Repository link: -
      13. License: -
      14. Repository status: -
      15. Documentation: No documentation

      16. Computational environment of reproduction analysis

      17. Operating system for reproduction: MacOS 14.7.4
      18. Programming Language(s): R
      19. Code implementation approach: Creating script according to the methodology description/Using shared code
      20. Version environment for reproduction: R version 4.4.1/RStudio 2024.09.0

      21. Results 5.1 Original study results

      22. Results 1: Results are presented in Figure 5A. 5.2 Steps for reproduction -> Reproduce the results The code was not shared initially, but as the data were provided and the test was a Fisher's exact test, I wrote code to reproduce the p-values.

      23. Issue 1: P-values for the SNVs variant as well as the « Modifier » impact class were not provided. -- Resolved: Authors provided an updated Supplementary table S6 with exact numerical p-values for each variant and each impact class. The code "variantEnrichAtPeaks.R" to generate the Figure 5A and the Supplementary table S6 was also shared. New version of the supplementary Table S6: (see screenshot)

      The comparison between the reproduced results and the original results was then performed using the shared code. (Notably, the results from the R script written allowed for the generation of the same p-value as the one presented in Figure 5A).

      • Issue 2: In the script "variantEnrichAtPeaks.R", only the figures were generated, not the new supplementary Table S6 with the numerical p-values. -- Resolved: Some code lines was added in the function "makePlot" to generate this table in addition to the figure.

      Line 159 to 178 of the script "variantEnrichAtPeaks_RCC."

      1. Supplementary table S6 (add)

      summary_table <- df %>% mutate( Type = variantType, Genome_Wide_Prop = Genome_wide / sum(Genome_wide), Selective_Sweep_peaks_Prop = Sweep / sum(Sweep), Ratio_of_proportions = Selective_Sweep_peaks_Prop / Genome_Wide_Prop) %>% left_join(pval_df, by = "Impact") %>% select( Impact, Type, Genome_Wide = Genome_wide, Selective_Sweep peaks = Sweep, Genome_Wide Prop = Genome_Wide_Prop, Selective_Sweep peaks Prop= Selective_Sweep_peaks_Prop, Ratio of proportions= Ratio_of_proportions, Fishers exact P = p_value)

      return(list(plot = p, summary_table = summary_table))

      5.3 Statistical comparison Original vs Reproduced results - Results: Figure and table S6 were reproduced for each variant type and impact: -- SVs type: (see screenshot) -- Indels type: (see screenshot) -- And SNVs type: (see screenshot)

      • Comments: The shared code was used to compute the p-values and generated the Figures. Minor numerical error discrepancy was observed for some p-values, likely due to rounding differences. The p-values in the original Excel file appear to be stored with less decimal precision than those computed in R. This difference is negligible and does not indicate a reproducibility issue.
      • Errors detected: No error detected.
      • Statistical Consistency: The results were successfully reproduced with the share code.

      • Conclusion

      • Summary of the computational reproducibility review The Fisher's exact tests for enrichment across variant and impact categories, presented in Figure 5A of the manuscript, were successfully reproduced using the data in supplementary table S6 and the shared code. Results were consistent with the original, with only negligible rounding differences in p-values.
      • Recommendations for authors We were able to reproduce study with the data and information provided in the Figure 5A description. To further improve transparency and ensure full reproducibility of your manuscript, the following recommendations are suggested: -- Make the codes to reproduce all analyses in the paper openly available to allow anyone to reproduce the results. Ideally, provide a README or requirements.txt file describing how to run the analysis, including software versions, packages, and dependencies. -- Include statistical outputs, such as exact p-values, in supplementary materials when possible. This ensures clarity and eases verification. Ideally, provide metadata: For the datasets used or generated by the scripts, it would be helpful to include accompanying metadata files that explain: --- The definition of each variable name. --- The origin of each dataset (raw, processed, etc). --- Any preprocessing steps applied before analysis.
    1. Recent advancements in transcriptomics and proteomics have opened the possibility for spatially resolved molecular characterization of tissue architecture with the promise of enabling a deeper understanding of tissue biology in either homeostasis or disease. The wealth of data generated by these technologies has recently driven the development of a wide range of computational methods. These methods have the requirement of advanced coding fluency to be applied and integrated across the full spatial omics analysis process thus presenting a hurdle for widespread adoption by the biology research community. To address this, we introduce SPEX (Spatial Expression Explorer), a web-based analysis platform that employs modular analysis pipeline design, accessible through a user-friendly interface. SPEX’s infrastructure allows for streamlined access to open source image data management systems,analysis modules, and fully integrated data visualization solutions. Analysis modules include essential steps covering image processing, single-cell and spatial analysis. We demonstrate SPEX’s ability to facilitate the discovery of biological insights in spatially resolved omics datasets from healthy tissue to tumor samples.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf090), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 1: Ka Yee Yeung

      Li et al. presented SPEX (Spatial Expression Explorer), a web-based open-source end-to-end analysis platform offering modular design and a user accessible interface. The users demonstrated use cases in spatial transcriptomics (MERFISH lung cancer) and spatial proteomics datasets (tonsil, public multiplex ion beam imaging data). SPEX includes the following analytical modules 1. image processing modules includes a 4-step sequence (image pre-processing, single-cell segmentation, post-processing, feature selection). Image loading supports OMERO integration. Output is a cell by expression matrix in Anndata format. 2. clustering modules for both spatial transcriptomic and proteomic data. 3. spatial analysis module implements the CLQ (Colocation Quotient) method. 4. spatial expression analysis module includes differential expression and pathway analysis. SPEX supports visualization via Vitessce.

      The paper is well written, addresses a rising interest and critical need in the biomedical community. The reviewer would like to request clarifications on how extensible the modules are. The author mentioned a SPEX pipeline builder in which "modules are selected from a library and dragged into a visual pipeline map", and also mentioend the support for "flexible plug-in analysis modules". What are the packages available from the library? Can users import their own code or script or package? How to create new plug-in's?

      The reviewer is also wondering how do the users interact with the results? Can the user click on the resulting image and select regions of interest to zoom in?

    1. The photo above shows the ENIAC [b123] computer (built with US Army funds in 1945, this was the first electronic general-purpose computer), being programmed by three of the six women

      I took a CMS class where we spoke a lot about the ENIAC and human computers. These women were incredible and I am excited to learn how to code.

    1. Reviewer #2 (Public review):

      Summary:

      In this work, the authors present a new Python software package, Avian Vocalization Network (AVN) aimed at facilitating the analysis of birdsong, especially the song of the zebra finch, the most common songbird model in neuroscience. The package handles some of the most common (and some more advanced) song analyses, including segmentation, syllable classification, featurization of song, calculation of tutor-pupil similarity, and age prediction, with a view toward making the entire process friendlier to experimentalists with limited coding experience working in the field.

      For many years, Sound Analysis Pro has served as a standard in the songbird field, the first package to extensively automate songbird analysis and facilitate the computation of acoustic features that have helped define the field. More recently, the increasing popularity of Python as a language, along with the emergence of new machine learning methods, has resulted in a number of new software tools, including the vocalpy ecosystem for audio processing, TweetyNet (for segmentation), t-SNE and UMAP (for visualization), and autoencoder-based approaches for embedding.

      As with any software package, this one necessarily makes a number of design choices, which may or may not fit the needs of all users. Those who prefer a more automated pipeline with fewer knobs to turn may appreciate AVN in cases where the existing recipes fit their needs, while those who require more customization and flexibility may require a more bespoke (and thus code-intensive) approach.

      Strengths:

      The AVN package overlaps several of these earlier efforts, albeit with a focus on more traditional featurization that many experimentalists may find more interpretable than deep learning-based approaches. Among the strengths of the paper are its clarity in explaining the several analyses it facilitates, along with high-quality experiments across multiple public datasets collected from different research groups. As a software package, it is open source, installable via the pip Python package manager, and features high-quality documentation, as well as tutorials. For experimentalists who wish to replicate any of the analyses from the paper, the package is likely to be a useful time saver.

      Weaknesses:

      I think the potential limitations of the work are predominantly on the software end, with one or two quibbles about the methods.

      First, the software: It's important to note that the package is trying to do many things, of which it is likely to do several well and a few comprehensively. Rather than a package that presents a number of new analyses or a new analysis framework, it is more a codification of recipes, some of which are reimplementations of existing work (SAP features), some of which are essentially wrappers around other work (interfacing with WhisperSeg segmentations), and some of which are new (similarity scoring). All of this has value, but in my estimation, it has less value as part of a standalone package and potentially much more as part of an ecosystem like vocalpy that is undergoing continuous development and has long-term support. While the code is well-documented, including web-based documentation for both the core package and the GUI, the latter is available only on Windows, which might limit the scope of adoption.

      That is to say, whether AVN is adopted by the field in the medium term will have much more to do with the quality of its maintenance and responsiveness to users than any particular feature, but I believe that many of the analysis recipes that the authors have carefully worked out may find their way into other code and workflows.

      In the revised version of the paper, the authors have expanded their case for the design choices made in AVN and remain committed to maintaining the tool. Given the low cost for users in trying new methods and the work the authors have put into further reducing this overhead via documentation, those curious about the package are likely best served by simply downloading it and giving it a try on their own data.

      Second, two notes about new analysis approaches:

      (1) The authors propose a new means of measuring tutor-pupil similarity based on first learning a latent space of syllables via a self-supervised learning (SSL) scheme and then using the earth mover's distance (EMD) to calculate transport costs between the distributions of tutors' and pupils' syllables. While, to my knowledge, this exact method has not previously been proposed in birdsong, I suspect it is unlikely to differ substantially from the approach of autoencoding followed by MMD used in the Goffinet et al. paper. That is, SSL, like the autoencoder, is a latent space learning approach, and EMD, like MMD, is an integral probability metric that measures discrepancies between two distributions. (Indeed, the two are very closely related: https://stats.stackexchange.com/questions/400180/earth-movers-distance-and-maximum-mean-discrepency.) Without further experiments, it is hard to tell whether these two approaches differ meaningfully. Likewise, while the authors have trained on a large corpus of syllables to define their latent space in a way that generalizes to new birds, it is unclear why such an approach would not work with other latent space learning methods.

      Update: The authors now provide an extensive comparison with the Goffinet et al. paper and also consider differences between MMD and EMD. This comparison both adds value to the original paper and provides useful benchmarking for others looking to develop latent space comparison methods.

      (2) The authors propose a new method for maturity scoring by training a model (a generalized additive model) to predict the age of the bird based on a selected subset of acoustic features. This is distinct from the "predicted age" approach of Brudner, Pearson, and Mooney, which predicts based on a latent representation rather than specific features, and the GAM nicely segregates the contribution of each. As such, this approach may be preferred by many users who appreciate its interpretability.

      In summary, my view is that this is a nice paper detailing a well-executed piece of software whose future impact will be determined by the degree of support and maintenance it receives from others over the near and medium term.

    2. Reviewer #3 (Public review):

      This paper introduces the Avian Vocalization Network (AVN), a novel birdsong analysis pipeline using deep learning. By automating vocal annotation tasks, the AVN generates interpretable song features and song similarity scores on novel datasets without retraining. The performance of the network is solid and is comparable to that of human annotators.

      The authors have improved the manuscript in several aspects, such as the comparison with the Goffinet work. Overall, the AVN feature set could become a useful tool for evaluating birdsongs. But the authors also chose not to address a certain number of criticisms, and some issues remain poorly addressed, and the work is not reproducible at this stage. With a little effort, these issues could get resolved in my view. I will just pick on four issues that I think can be easily addressed:

      (1) Limitation of feature set: They claim that AVN satisfies the criteria (line 60) of "creating a common feature space for the comparison of behavioural phenotypes ..."(line 51), but then on LDA analysis, explained on line 910 they say "excluding amplitude and amplitude modulation features as they were found to vary". Since their feature set is not stable and not truly 'common' to all tasks, this limitation needs addressing in the discussion (that some features seem to vary undesirably, and they need exclusion based on some criteria to be defined).

      (2) Missing information on classification training loss: The Authors insist that their triplet loss is not related to classification, and they brush off my request for more information. In their rebuttal, they write: 'The loss function is related to the relative distance between embeddings of syllables with the same or different labels, not the classification of syllables as same or different.' Perplexingly, however, in the revised paper, authors speak themselves of 'classes', in Line 1004: this allows the model to begin learning an easier task, of separating syllables of different classes by a smaller margin.' So it seems the authors actually agree with me that there is an underlying classification task. I am therefore going to make it a bit more explicit here what I'm asking for, hoping this will better resonate with them.

      In line 984 they define their loss function and in lines 994-996 they define 'hard' and 'semi-hard' triplets. Authors then train a system to minimize the loss with a ratio of 75 percent semi-hard triplets and 25 percent hard triplets and a final weighing parameter value alpha=0.7. What I'm asking for is this 'classification' loss their trained model achieves, or in other words, the fraction of triplets that end up producing a loss, either of the 'hard' or 'semi-hard' type. For example, if their model manages to separate all 'possible triplets' by a margin of at least alpha, then the loss would be zero. If the model achieves to separate all triplets except one, then the loss would correspond to the amount by which the separation differences between the anchor and the positive vs negative samples exceeds alpha. So, an important number to provide in the paper is the fraction of triplets that incur a nonzero loss, i.e., the fraction of semi-hard triplets. And another important quantity is the fraction of hard triplets, i.e. the fraction of triplets that would incur a loss if alpha were set to zero, or, in other words, the triplets for which the negative sample is closer to the anchor than the positive sample. By the way, I assume this latter fraction of hard cases will be zero - that their model does not confuse any positive and negative training samples...<br /> Note: the quantification chosen by the authors termed 'contrast index' is interesting, but it is a derived quantity, it is not the quantity authors chose to optimize during training. If authors were to report both the training loss achieved and the 'contrast index', follow-up work could be benchmarked against both these quantities. If for example, a follow-up model achieves smaller loss but worse contrast, then the loss is not a good placeholder measure for optimizing contrast. Alternatively, follow-up work could focus on the contrast index as training objective, obliterating the need for the triplet loss as an intermediate step (I don't buy the authors' argument that such an optimization would be infeasible).

      (3) Reproducibility: they explain the way they train the CNN with triplet loss to produce the embeddings, but we're missing both actual scripts on GitHub to train and inference from scratch, and model weights, or even hyper parameters they used. Authors only provide the architecture, and I don't think that's enough to be considered replicable in today's standards. I would suggest they release complete model checkpoint weights for the result they report, the exact data splits, the hyper parameters they used and training and testing code, so that one can very easily verify their claims and apply their methods to other datasets. Note: for example, the code to extract the embeddings is incomplete (the function definition of single_bird_extract_embeddings cannot be found on GitHub) and the model weights they used are missing.

      (4) With regards to the age prediction model, the authors should specify that this model is mainly useful for comparisons across studies but less so for precise evaluation of the effects of a treatment within a study. Namely, the effect on song of a treatment is best assessed by comparison to within-subject past song, and by comparison to age-matched control birds (ideally siblings) raised in identical conditions, rather than to invoke a generic model trained on other birds and from different colonies and breeding conditions as authors propose to do. In other words, to introduce a generic model for evaluation of song maturity introduces measurement noise in terms of the additional birds and their variable conditions, which can hinder precise assessment of treatment effects. Note that to state that in past work such maturity models were used is not a good justification, scientifically speaking.

      Finally, the authors write that methods for syllable segmentation have not been systematically compared but the whisperseg work they use did such a comparison. So the authors should revise their novelty claim of being the first to compare syllable segmentation methods.

    3. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary: 

      This paper applies methods for segmentation, annotation, and visualization of acoustic analysis to zebra finch song. The paper shows that these methods can be used to predict the stage of song development and to quantify acoustic similarity. The methods are solid and are likely to provide a useful tool for scientists aiming to label large datasets of zebra finch vocalizations. The paper has two main parts: 1) establishing a pipeline/ package for analyzing zebra finch birdsong and 2) a method for measuring song imitation. 

      Strengths: 

      It is useful to see existing methods for syllable segmentation compared to new datasets.

      It is useful, but not surprising, that these methods can be used to predict developmental stage, which is strongly associated with syllable temporal structure.

      It is useful to confirm that these methods can identify abnormalities in deafened and isolated songs. 

      Weaknesses: 

      For the first part, the implementation seems to be a wrapper on existing techniques. For instance, the first section talks about syllable segmentation; they made a comparison between whisperseg (Gu et al, 2024), tweetynet (Cohen et al, 2022), and amplitude thresholding. They found that whisperseg performed the best, and they included it in the pipeline. They then used whisperseg to analyze syllable duration distributions and rhythm of birds of different ages and confirmed past findings on this developmental process (e.g. Aronov et al, 2011). Next, based on the segmentation, they assign labels by performing UMAP and HDBScan on the spectrogram (nothing new; that's what people have been doing). Then, based on the labels, they claimed they developed a 'new' visualization - syntax raster ( line 180 ). That was done by Sainburg et. al. 2020 in Figure 12E and also in Cohen et al, 2020 - so the claim to have developed 'a new song syntax visualization' is confusing. The rest of the paper is about analyzing the finch data based on AVN features (which are essentially acoustic features already in the classic literature). 

      First, we would like to thank this reviewer for their kind comments and feedback on this manuscript. It is true that many of the components of this song analysis pipeline are not entirely novel in isolation. Our real contribution here is bringing them together in a way that allows other researchers to seamlessly apply automated syllable segmentation, clustering, and downstream analyses to their data. That said, our approach to training TweetyNet for syllable segmentation is novel. We trained TweetyNet to recognize vocalizations vs. silence across multiple birds, such that it can generalize to new individual birds, whereas Tweetynet had only ever been used to annotate song syllables from birds included in its training set previously. Our validation of TweetyNet and WhisperSeg in combination with UMAP and HDBSCAN clustering is also novel, providing valuable information about how these systems interact, and how reliable the completely automatically generated labels are for downstream analysis. We have added a couple sentences to the introduction to emphasize the novelty of this approach and validation.

      Our syntax raster visualization does resemble Figure 12E in Sainburg et al. 2020, however it differs in a few important ways, which we believe warrant its consideration as a novel visualization method. First, Sainburg et al. represent the labels across bouts in real time; their position along the x axis reflects the time at which each syllable is produced relative to the start of the bout. By contrast, our visualization considers only the index of syllables within a bout (ie. First syllable vs. second syllable etc) without consideration of the true durations of each syllable or the silent gaps between them. This makes it much easier to detect syntax patterns across bouts, as the added variability of syllable timing is removed. Considering only the sequence of syllables rather than their timing also allows us to more easily align bouts according to the first syllable of a motif, further emphasizing the presence or absence of repeating syllable sequences without interference from the more variable introductory notes at the start of a motif. Finally, instead of plotting all bouts in the order in which they were produced, our visualization orders bouts such that bouts with the same sequence of syllables will be plotted together, which again serves to emphasize the most common syllable sequences that the bird produces. These additional processing steps mean that our syntax raster plot has much starker contrast between birds with stereotyped syntax and birds with more variable syntax, as compared to the more minimally processed visualization in Sainburg et al. 2020. There doesn’t appear to be any similar visualizations in Cohen et al. 2020. 

      The second part may be something new, but there are opportunities to improve the benchmarking. It is about the pupil-tutor imitation analysis. They introduce a convolutional neural network that takes triplets as an input (each tripled is essentially 3 images stacked together such that you have (anchor, positive, negative), Anchor is a reference spectrogram from, say finch A; positive means a different spectrogram with the same label as anchor from finch A, and negative means a spectrogram not related to A or different syllable label from A. The network is then trained to produce a low-dimensional embedding by ensuring the embedding distance between anchor and positive is less than anchor and negative by a certain margin. Based on the embedding, they then made use of earth mover distance to quantify the similarity in the syllable distribution among finches. They then compared their approach performance with that of sound analysis pro (SAP) and a variant of SAP. A more natural comparison, which they didn't include, is with the VAE approach by Goffinet et al. In this paper (https://doi.org/10.7554/eLife.67855, Fig 7), they also attempted to perform an analysis on the tutor pupil song.  

      We thank the reviewer for this suggestion. We have included a comparison of our triplet loss embedding model to the VAE model proposed in Goffinet et al. 2021. We also included comparisons of similarity scoring using each of these embedding models combined with either earth mover’s distance (EMD) or maximum mean discrepancy (MMD) to calculate the similarity of the embeddings, as was done in Goffinet et al. 2021. As discussed in the updated results section of the paper and shown in the new Figure 6–figure supplement 1, the Triplet loss model with MMD performs best for evaluating song learning on new birds, not included in model training. We’ve updated the main text of the paper to reflect this switch from EMD to MMD for the primary similarity scoring approach.

      Reviewer #2 (Public Review):

      Summary: 

      In this work, the authors present a new Python software package, Avian Vocalization Network (AVN) aimed at facilitating the analysis of birdsong, especially the song of the zebra finch, the most common songbird model in neuroscience. The package handles some of the most common (and some more advanced) song analyses, including segmentation, syllable classification, featurization of song, calculation of tutor-pupil similarity, and age prediction, with a view toward making the entire process friendlier to experimentalists working in the field.

      For many years, Sound Analysis Pro has served as a standard in the songbird field, the first package to extensively automate songbird analysis and facilitate the computation of acoustic features that have helped define the field. More recently, the increasing popularity of Python as a language, along with the emergence of new machine learning methods, has resulted in a number of new software tools, including the vocalpy ecosystem for audio processing, TweetyNet (for segmentation), t-SNE and UMAP (for visualization), and autoencoder-based approaches for embedding.

      Strengths: 

      The AVN package overlaps several of these earlier efforts, albeit with a focus on more traditional featurization that many experimentalists may find more interpretable than deep learning-based approaches. Among the strengths of the paper are its clarity in explaining the several analyses it facilitates, along with high-quality experiments across multiple public datasets collected from different research groups. As a software package, it is open source, installable via the pip Python package manager, and features high-quality documentation, as well as tutorials. For experimentalists who wish to replicate any of the analyses from the paper, the package is likely to be a useful time saver.

      Weaknesses: 

      I think the potential limitations of the work are predominantly on the software end, with one or two quibbles about the methods.

      First, the software: it's important to note that the package is trying to do many things, of which it is likely to do several well and few comprehensively. Rather than a package that presents a number of new analyses or a new analysis framework, it is more a codification of recipes, some of which are reimplementations of existing work (SAP features), some of which are essentially wrappers around other work (interfacing with WhisperSeg segmentations), and some of which are new (similarity scoring). All of this has value, but in my estimation, it has less value as part of a standalone package and potentially much more as part of an ecosystem like vocalpy that is undergoing continuous development and has long-term support. 

      We appreciate this reviewer’s comments and concerns about the structure of the AVN package and its long-term maintenance. We have considered incorporating AVN into the VocalPy ecosystem but have chosen not to for a few key reasons. (1) AVN was designed with ease of use for experimenters with limited coding experience top of mind. VocalPy provides excellent resources for researchers with some familiarity with object-oriented programming to manage and analyze their datasets; however, we believe it may be challenging for users without such experience to adopt VocalPy quickly. AVN’s ‘recipe’ approach, as you put it, is very easily accessible to new users, and allows users with intermediate coding experience to easily navigate the source code to gain a deeper understanding of the methodology. AVN also consistently outputs processed data in familiar formats (tables in .csv files which can be opened in excel), in an effort to make it more accessible to new users, something which would be challenging to reconcile with VocalPy’s emphasis on their `dataset`classes. (2) AVN and VocalPy differ in their underlying goals and philosophies when it comes to flexibility vs. standardization of analysis pipelines. VocalPy is designed to facilitate mixing-and-matching of different spectrogram generation, segmentation, annotation etc. approaches, so that researchers can design and implement their own custom analysis pipelines. This flexibility is useful in many cases. For instance, it could allow researchers who have very different noise filtering and annotation needs, like those working with field recordings versus acoustic chamber recordings, to analyze their data using this platform. However, when it comes to comparisons across zebra finch research labs, this flexibility comes at the expense of direct comparison and integration of song features across research groups. This is the context in which AVN is most useful. It presents a single approach to song segmentation, labeling, and featurization that has been shown to generalize well across research groups, and which allows direct comparisons of the resulting features. AVN’s single, extensively validated, standard pipeline approach is fundamentally incompatible with VocalPy’s emphasis on flexibility. We are excited to see how VocalPy continues to evolve in the future, and recognize the value that both AVN and VocalPy bring to the songbird research community, each with their own distinct strengths, weaknesses, and ideal use cases. 

      While the code is well-documented, including web-based documentation for both the core package and the GUI, the latter is available only on Windows, which might limit the scope of adoption. 

      We thank the reviewer for their kind words about AVN’s documentation. We recognize that the GUI’s exclusive availability on Windows is a limitation, and we would be happy to collaborate with other researchers and developers in the future to build a Mac compatible version, should the demand present itself. That said, the python package works on all operating systems, so non-Windows users still have the ability to use AVN that way.

      That is to say, whether AVN is adopted by the field in the medium term will have much more to do with the quality of its maintenance and responsiveness to users than any particular feature, but I believe that many of the analysis recipes that the authors have carefully worked out may find their way into other code and workflows. 

      Second, two notes about new analysis approaches:

      (1) The authors propose a new means of measuring tutor-pupil similarity based on first learning a latent space of syllables via a self-supervised learning (SSL) scheme and then using the earth mover's distance (EMD) to calculate transport costs between the distributions of tutors' and pupils' syllables. While to my knowledge this exact method has not previously been proposed in birdsong, I suspect it is unlikely to differ substantially from the approach of autoencoding followed by MMD used in the Goffinet et al. paper. That is, SSL, like the autoencoder, is a latent space learning approach, and EMD, like MMD, is an integral probability metric that measures discrepancies between two distributions. (Indeed, the two are very closely related: https://stats.stackexchange.com/questions/400180/earth-movers-distance-andmaximum-mean-discrepency.) Without further experiments, it is hard to tell whether these two approaches differ meaningfully. Likewise, while the authors have trained on a large corpus of syllables to define their latent space in a way that generalizes to new birds, it is unclear why such an approach would not work with other latent space learning methods.  

      We recognize the similarities between these approaches and have included comparisons of the VAE and MMD as in the Goffinet paper to our triplet loss model and EMD.  As discussed in the updated results section of the paper and shown in the new Figure 6–figure supplement 1, the Triplet loss model with MMD performs best for evaluating song learning on new birds, not included in model training. We’ve updated the main text of the paper to reflect this switch from EMD to MMD for the primary similarity scoring approach. 

      (2) The authors propose a new method for maturity scoring by training a model (a generalized additive model) to predict the age of the bird based on a selected subset of acoustic features. This is distinct from the "predicted age" approach of Brudner, Pearson, and Mooney, which predicts based on a latent representation rather than specific features, and the GAM nicely segregates the contribution of each. As such, this approach may be preferred by many users who appreciate its interpretability.  

      In summary, my view is that this is a nice paper detailing a well-executed piece of software whose future impact will be determined by the degree of support and maintenance it receives from others over the near and medium term.

      Reviewer #3 (Public Review):

      Summary: 

      The authors invent song and syllable discrimination tasks they use to train deep networks. These networks they then use as a basis for routine song analysis and song evaluation tasks. For the analysis, they consider both data from their own colony and from another colony the network has not seen during training. They validate the analysis scores of the network against expert human annotators, achieving a correlation of 80-90%. 

      Strengths: 

      (1) Robust Validation and Generalizability: The authors demonstrate a good performance of the AVN across various datasets, including individuals exhibiting deviant behavior. This extensive validation underscores the system's usefulness and broad applicability to zebra finch song analysis, establishing it as a potentially valuable tool for researchers in the field.

      (2) Comprehensive and Standardized Feature Analysis: AVN integrates a comprehensive set of interpretable features commonly used in the study of bird songs. By standardizing the feature extraction method, the AVN facilitates comparative research, allowing for consistent interpretation and comparison of vocal behavior across studies.

      (3) Automation and Ease of Use. By being fully automated, the method is straightforward to apply and should introduce barely an adoption threshold to other labs.

      (4) Human experts were recruited to perform extensive annotations (of vocal segments and of song similarity scores). These annotations released as public datasets are potentially very valuable. 

      Weaknesses: 

      (1) Poorly motivated tasks. The approach is poorly motivated and many assumptions come across as arbitrary. For example, the authors implicitly assume that the task of birdsong comparison is best achieved by a system that optimally discriminates between typical, deaf, and isolated songs. Similarly, the authors assume that song development is best tracked using a system that optimally estimates the age of a bird given its song. My issue is that these are fake tasks since clearly, researchers will know whether a bird is an isolated or a deaf bird, and they will also know the age of a bird, so no machine learning is needed to solve these tasks. Yet, the authors imagine that solving these placeholder tasks will somehow help with measuring important aspects of vocal behavior.  

      We appreciate this reviewer’s concerns and apologize for not providing sufficiently clear rationale for the inclusion of our phenotype classifier and age regression models in the original manuscript. These tasks are not intended to be taken as a final, ultimate culmination of the AVN pipeline. Rather, we consider the carefully engineered 55-interpretable feature set to be AVN’s final output, and these analyses serve merely as examples of how that feature set can be applied. That said, each of these models do have valid experimental use cases that we believe are important and would like to bring to the attention of the reviewer.

      For one, we showed how the LDA model that can discriminate between typical, deaf, and isolate birds’ songs not only allows us to evaluate which features are most important for discriminating between these groups, but also allows comparison of the FoxP1 knock-down (FP1 KD) birds to each of these phenotypes. Based on previous work (Garcia-Oscos et al. 2021), we hypothesized that FP1 KD in these birds specifically impaired tutor song memory formation while sparing a bird’s ability to refine their own vocalizations through auditory feedback. Thus, we would expect their songs to resemble those of isolate birds, who lack a tutor song memory, but not to resemble deaf birds who lack a tutor song memory and auditory feedback of their own vocalizations to guide learning. The LDA model allowed us to make this comparison quantitatively for the first time and confirm our hypothesis that FP1 KD birds’ songs are indeed most like isolates’. In the future, as more research groups publish their birds’ AVN feature sets, we hope to be able to make even more fine-grained comparisons between different groups of birds, either using LDA or other similar interpretable classifiers. 

      The age prediction model also has valid real-world use cases. For instance, one might imagine an experimental manipulation that is hypothesized to accelerate or slow song maturation in juvenile birds. This age prediction model could be applied to the AVN feature sets of birds having undergone such a manipulation to determine whether their predicted ages systematically lead or lag their true biological ages, and which song features are most responsible for this difference. We didn’t have access to data for any such birds for inclusion in this paper, but we hope that others in the future will be able to take inspiration from our methodology and use this or a similar age regression model with AVN features in their research. We have added a couple lines to the ‘Comparing Song Disruptions with AVN Features’ and ‘Tracking Song Development with AVN Features’ sections of the results to make this more clear. 

      Along similar lines, authors assume that a good measure of similarity is one that optimally performs repeated syllable detection (i.e. to discriminate same syllable pairs from different pairs). The authors need to explain why they think these placeholder tasks are good and why no better task can be defined that more closely captures what researchers want to measure. Note: the standard tasks for self-supervised learning are next word or masked word prediction, why are these not used here? 

      This reviewer appears to have misunderstood our similarity scoring embedding model and our rationale for using it. We will explain it in more depth here and have added a paragraph to the ‘Measuring Song Imitation’ section of the results explaining this rationale more briefly.

      First, nowhere are we training a model to discriminate between same and different syllable pairs. The triplet loss network is trained to embed syllables in an 8-dimensional space such that syllables with the same label are closer together than syllables with different labels. The loss function is related to the relative distance between embeddings of syllables with the same or different labels, not the classification of syllables as same or different. This approach was chosen because it has repeatedly been shown to be a useful data compression step (Schorff et al. 2015, Thakur et al. 2019) before further downstream tasks are applied on its output, particularly in contexts where there is little data per class (syllable label). For example, Schorff et al. 2015 trained a deep convolutional neural network with triplet loss to embed images of human faces from the same individual closer together than images of different individuals in a 128dimensional space. They then used this model to compute 128-dimensional representations of additional face images, not included in training, which were used for individual facial recognition (this is a same vs. different category classifier), and facial clustering, achieving better performance than the previous state of the art. The triplet loss function results in a model that can generate useful embeddings of previously unseen categories, like new individuals’ faces, or new zebra finches’ syllables, which can then be used in downstream analyses. This meaningful, lower dimensional space allows comparisons of distributions of syllables across birds, as in Brainard and Mets 2008, and Goffinet et al. 2021. 

      Next word and masked word prediction are indeed common self-supervised learning tasks for models working with text data, or other data with meaningful sequential organization. That is not the case for our zebra finch syllables, where every bird’s syllable sequence depends only on its tutor’s sequence, and there is no evidence for strong universal syllable sequencing rules (James et al. 2020). Rather, our embedding model is an example of a computer vision task, as it deals with sets of two-dimensional images (spectrograms), not sequences of categorical variables (like text). It is also not, strictly speaking, a selfsupervised learning task, as it does require syllable labels to generate the triplets. A common selfsupervised approach for dimensionality reduction in a computer vision task such as this one would be to train an autoencoder to compress images to a lower dimensional space, then faithfully reconstruct them from the compressed representation.  This has been done using a variational autoencoder trained on zebra finch syllables in Goffinet et al. 2021. In keeping with the suggestions from reviewers #1 and #2, we have included a comparison of our triplet loss model with the Goffinet et al. VAE approach in the revised manuscript. 

      (2) The machine learning methodology lacks rigor. The aims of the machine learning pipeline are extremely vague and keep changing like a moving target. Mainly, the deep networks are trained on some tasks but then authors evaluate their performance on different, disconnected tasks. For example, they train both the birdsong comparison method (L263+) and the song similarity method (L318+) on classification tasks. However, they evaluate the former method (LDA) on classification accuracy, but the latter (8-dim embeddings) using a contrast index. In machine learning, usually, a useful task is first defined, then the system is trained on it and then tested on a held-out dataset. If the sensitivity index is important, why does it not serve as a cost function for training?

      Again, this reviewer seems not to understand our similarity scoring methodology. Our similarity scoring model is not trained on a classification task, but rather on an embedding task. It learns to embed spectrograms of syllables in an 8-dimensional space such that syllables with the same label are closer together than syllables with different labels. We could report the loss values for this embedding task on our training and validation datasets, but these wouldn’t have any clear relevance to the downstream task of syllable distribution comparison where we are using the model’s embeddings. We report the contrast index as this has direct relevance to the actual application of the model and allows comparisons to other similarity scoring methods, something that the triplet loss values wouldn’t allow. 

      The triplet loss method was chosen because it has been shown to yield useful low-dimensional representations of data, even in cases where there is limited labeled training data (Thakur et al. 2019). While we have one of the largest manually annotated datasets of zebra finch songs, it is still quite small by industry deep learning standards, which is why we chose a method that would perform well given the size of our dataset. Training a model on a contrast index directly would be extremely computationally intensive and require many more pairs of birds with known relationships than we currently have access to. It could be an interesting approach to take in the future, but one that would be unlikely to perform well with a dataset size typical to songbird research. 

      Also, usually, in solid machine learning work, diverse methods are compared against each other to identify their relative strengths. The paper contains almost none of this, e.g. authors examined only one clustering method (HDBSCAN).  

      We did compare multiple methods for syllable segmentation (WhisperSeg, TweetyNet, and Amplitude thresholding) as this hadn’t been done previously. We chose not to perform extensive comparison of different clustering methods as Sainburg et al. 2020 already did so and we felt no need to reduplicate this effort. We encourage this reviewer to refer to Sainburg et al.’s excellent work for comparisons of multiple clustering methods applied to zebra finch song syllables.

      (3) Performance issues. The authors want to 'simplify large-scale behavioral analysis' but it seems they want to do that at a high cost. (Gu et al 2023) achieved syllable scores above 0.99 for adults, which is much larger than the average score of 0.88 achieved here (L121). Similarly, the syllable scores in (Cohen et al 2022) are above 94% (their error rates are below 6%, albeit in Bengalese finches, not zebra finches), which is also better than here. Why is the performance of AVN so low? The low scores of AVN argue in favor of some human labeling and training on each bird.  

      Firstly, the syllable error rate scores reported in Cohen et al. 2022 are calculated very differently than the F1 scores we report here and are based on a model trained with data from the same bird as was used in testing, unlike our more general segmentation approach where the model was tested on different birds than were used in training. Thus, the scores reported in Cohen et al. and the F1 scores that we report cannot be compared. 

      The discrepancy between the F1<sub>seg</sub> scores reported in Gu et al. 2023 and the segmentation F1 scores that we report are likely due to differences in the underlying datasets. Our UTSW recordings tend to have higher levels of both stationary and non-stationary background noise, which make segmentation more challenging. The recordings from Rockefeller were less contaminated by background noise, and they resulted in slightly higher F1 scores. That said, we believe that the primary factor accounting for this difference in scores with Gu et al. 2023 is the granularity of our ‘ground truth’ syllable segments. In our case, if there was never any ambiguity as to whether vocal elements should be segmented into two short syllables with a very short gap between them or merged into a single longer syllable, we chose to split them. WhisperSeg had a strong tendency to merge the vocal elements in ambiguous cases such as these. This results in a higher rate of false negative syllable onset detections, reflected in the low recall scores achieved by WhisperSeg (see Figure 2–figure supplement 1b), but still very high precision scores (Figure 2–figure supplement 1a). While WhisperSeg did frequently merge these syllables in a way that differed from our ground truth segmentation, it did so consistently, meaning it had little impact on downstream measures of syntax entropy (Figure 3c) or syllable duration entropy (Figure 3–figure supplement 2a). It is for that reason that, despite a lower F1 score, we still consider AVN’s automatically generated annotations to be sufficiently accurate for downstream analyses. 

      Should researchers require a higher degree of accuracy and precision with their annotations (for example, to detect very subtle changes in song before and after an acute manipulation) we suggest they turn toward one of the existing tools for supervised song annotation, such as TweetyNet.

      (4) Texas bias. It is true that comparability across datasets is enhanced when everyone uses the same code. However, the authors' proposal essentially is to replace the bias between labs with a bias towards birds in Texas. The comparison with Rockefeller birds is nice, but it amounts to merely N=1. If birds in Japanese or European labs have evolved different song repertoires, the AVN might not capture the associated song features in these labs well.  

      We appreciate the author’s concern about a bias toward birds from the UTSW colony. However, this paper shows that despite training (for the similarity scoring) and hyperparameter fitting (for the HDBSCAN clustering) on the UTSW birds, AVN performs as well if not better on birds from Rockefeller than from UTSW. To our knowledge, there are no publicly available datasets of annotated zebra finch songs from labs in Europe or in Asia but we would be happy to validate AVN on such datasets, should they become available. Furthermore, there is no evidence to suggest that there is dramatic drift in zebra finch vocal repertoire between continents which would necessitate such additional validation. While we didn’t have manual annotations for this dataset (which would allow validation of our segmentation and labeling methods), we did apply AVN to recordings shared with us by the Wada lab in Japan, where visual inspection of the resulting annotations suggested comparable accuracy to the UTSW and Rockefeller datasets. 

      (5) The paper lacks an analysis of the balance between labor requirement, generalizability, and optimal performance. For tasks such as segmentation and labeling, fine-tuning for each new dataset could potentially enhance the model's accuracy and performance without compromising comparability. E.g. How many hours does it take to annotate hundred song motifs? How much would the performance of AVN increase if the network were to be retrained on these? The paper should be written in more neutral terms, letting researchers reach their own conclusions about how much manual labor they want to put into their data.  

      With standardization and ease of use in mind, we designed AVN specifically to perform fully automated syllable annotation and downstream feature calculations. We believe that we have demonstrated in this manuscript that our fully automated approach is sufficiently reliable for downstream analyses across multiple zebra finch colonies. That said, if researchers require an even higher degree of annotation precision and accuracy, they can turn toward one of the existing methods for supervised song annotation, such as TweetyNet. Incorporating human annotations for each bird processed by AVN is likely to improve its performance, but this would require significant changes to AVN’s methodology, and is outside the scope of our current efforts.

      (6) Full automation may not be everyone's wish. For example, given the highly stereotyped zebra finch songs, it is conceivable that some syllables are consistently mis-segmented or misclassified. Researchers may want to be able to correct such errors, which essentially amounts to fine-tuning AVN. Conceivably, researchers may want to retrain a network like the AVN on their own birds, to obtain a more fine-grained discriminative method.  

      Other methods exist for supervised or human-in-the-loop annotation of zebra finch songs, such as TweetyNet and DAN (Alam et al. 2023). We invite researchers who require a higher degree of accuracy than AVN can provide to explore these alternative approaches for song annotation. Incorporating human feedback into AVN was never the goal of our pipeline, would require significant changes to AVN’s design and is outside the scope of this manuscript.

      (7) The analysis is restricted to song syllables and fails to include calls. No rationale is given for the omission of calls. Also, it is not clear how the analysis deals with repeated syllables in a motif, whether they are treated as two-syllable types or one.  

      It is true that we don’t currently have any dedicated features to describe calls. This could be a useful addition to AVN in the future. 

      What a human expert inspecting a spectrogram would typically call ‘repeated syllables’ in a bout are almost always assigned the same syllable label by the UMAP+HDBSCAN clustering. The syntax analysis module includes features examining the rate of syllable repetitions across syllable types, as mentioned in lines 222-226 of the revised manuscript. See https://avn.readthedocs.io/en/latest/syntax_analysis_demo.html#Syllable-Repetitions for further details.

      (8) It seems not all human annotations have been released and the instruction sets given to experts (how to segment syllables and score songs) are not disclosed. It may well be that the differences in performance between (Gu et al 2023) and (Cohen et al 2022) are due to differences in segmentation tasks, which is why these tasks given to experts need to be clearly spelled out. Also, the downloadable files contain merely labels but no identifier of the expert. The data should be released in such a way that lets other labs adopt their labeling method and cross-check their own labeling accuracy.  

      All human annotations used in this manuscript have indeed been released as part of the accompanying dataset. Syllable annotations are not provided for all pupils and tutors used to validate the similarity scoring, as annotations are not necessary for similarity comparisons. We have expanded our description of our annotation guidelines in the methods section of the revised manuscript. All the annotations were generated by one of two annotators. The second annotator always consulted with the first annotator in cases of ambiguous syllable segmentation or labeling, to ensure that they had consistent annotation styles. Unfortunately, we haven’t retained records about which birds were annotated by which of the two annotators, so we cannot share this information along with the dataset. The data is currently available in a format that should allow other research groups to use our annotations either to train their own annotation systems or check the performance of their existing systems on our annotations.  

      (9) The failure modes are not described. What segmentation errors did they encounter, and what syllable classification errors? It is important to describe the errors to be expected when using the method. 

      As we discussed in our response to this reviewer’s point (3), WhisperSeg has a tendency to merge syllables when the gap between them is very short, which explains its lower recall score compared to its precision on our dataset (Figure 2–figure supplement 1). In rare cases, WhisperSeg also fails to recognize syllables entirely, again impacting its precision score. TweetyNet hardly ever completely ignores syllables, but it does tend to occasionally merge syllables together or over-segment them. Whereas WhisperSeg does this very consistently for the same syllable types within the same bird, TweetyNet merges or splits syllables more inconsistently. This inconsistent merging and splitting has a larger effect on syllable labeling, as manifested in the lower clustering v-measure scores we obtain with TweetyNet compared to WhisperSeg segmentations. TweetyNet also has much lower precision than WhisperSeg, largely because TweetyNet often recognizes background noises (like wing flaps or hopping) as syllables whereas WhisperSeg hardly ever segments non-vocal sounds. 

      Many errors in syllable labeling stem from differences in syllable segmentation. For example, if two syllables with labels ‘a’ and ‘b’ in the manual annotation are sometimes segmented as two syllables, but sometimes merged into a single syllable, the clustering is likely to find 3 different syllable types; one corresponding to ‘a’, one corresponding to ‘b’ and one corresponding to ‘ab’ merged. Because of how we align syllables across segmentation schemes for the v-measure calculation, this will look like syllable ‘b’ always has a consistent cluster label (or is missing a label entirely), but syllable ‘a’ can carry two different cluster labels, depending on the segmentation. In certain cases, even in the absence of segmentation errors, a group of syllables bearing the same manual annotation label may be split into 2 or 3 clusters (it is extremely rare for a single manual annotation group to be split into more than 3 clusters). In these cases, it is difficult to conclusively say whether the clustering represents an error, or if it actually captured some meaningful systematic difference between syllables that was missed by the annotator. Finally, sometimes rare syllable types with their own distinct labels in the manual annotation are merged into a single cluster. Most labeling errors can be explained by this kind of merging or splitting of groups relative to the manual annotation, not to occasional mis-classifications of one manual label type as another.

      For examples of these types of errors, we encourage this reviewer and readers to refer to the example confusion matrices in figure 2f and Figure 2–figure supplement 3b&e. We also added two paragraphs to the end of the ‘Accurate, fully unsupervised syllable labeling’ section of the Results in the revised manuscript. 

      (10) Usage of Different Dimensionality Reduction Methods: The pipeline uses two different dimensionality reduction techniques for labeling and similarity comparison - both based on the understanding of the distribution of data in lower-dimensional spaces. However, the reasons for choosing different methods for different tasks are not articulated, nor is there a comparison of their efficacy.  

      We apologize for not making this distinction sufficiently clear in the manuscript and have added a paragraph to the ‘Measuring Song Imitation’ section of the Results explaining the rational for using an embedding model for similarity scoring. 

      We chose to use UMAP for syllable labeling because it is a common embedding methodology to precede hierarchical clustering and has been shown to result in reliable syllable labels for birdsong in the past (Sainburg et al. 2020). However, it is not appropriate for similarity scoring, because comparing EMD or MMD scores between birds requires that all the birds’ syllable distributions exist within the same shared embedding space. This can be achieved by using the same triplet loss-trained neural network model to embed syllables from all birds. This cannot be achieved with UMAP because all birds whose scores are being compared would need to be embedded in the same UMAP space, as distances between points cannot be compared across UMAPs. In practice, this would mean that every time a new tutor-pupil pair needs to be scored, their syllables would need to be added to a matrix with all previously compared birds’ syllables, a new UMAP would need to be computed, and new EMD or MMD scores between all bird pairs would need to be calculated using their new UMAP embeddings. This is very computationally expensive and quickly becomes unfeasible without dedicated high power computing infrastructure. It also means that similarity scores couldn’t be compared across papers without recomputing everything each time, whereas EMD and MMD scores obtained with triplet loss embeddings can be compared, provided they use the same trained model (which we provide as part of AVN) to embed their syllables in a common latent space. 

      (11) Reproducibility: are the measurements reproducible? Systems like UMAP always find a new embedding given some fixed input, so the output tends to fluctuate.

      There is indeed a stochastic element to UMAP embeddings which will result in different embeddings and therefore different syllable labels across repeated runs with the same input. We observed that v-measures scores were quite consistent within birds across repeated runs of the UMAP, and have added an additional supplementary figure to the revised manuscript showing this (Figure 2–figure supplement 4).

      Reviewer #1 (Recommendations For The Authors):

      (1) Benchmark their similarity score to the method used by Goffinet et al, 2021 from the Pearson group. Such a comparison would be really interesting and useful.  

      This has been added to the paper. 

      (2) Please clarify exactly what is new and what is applied from existing methods to help the reader see the novelty of the paper.  

      We have added more emphasis on the novel aspects of our pipeline to the paper’s introduction. 

      Minor:

      It's unclear if AVN is appropriate as the paper deals only with zebra finch song - the scope is more limited than advertised.

      We assume this is in reference to ‘Birdsong’ in the paper’s title and ‘Avian’ in Avian Vocalization Network. There is a brief discussion of how these methods are likely to perform on other commonly studied songbird species at the end of the discussion section.

      Reviewer #2 (Recommendations For The Authors):

      A few points for the authors to consider that might strengthen or inform the paper:

      (1) In the public review, I detailed some ways in which the SSL+EMD approach is unlikely to be appreciably distinct from the VAE+MMD approach -- in fact, one could mix and match here. It would strengthen the authors' claim if they showed via experiments that their method outperforms VAE+MMD, but in the absence of that, a discussion of the relation between the two is probably warranted.  

      This comparison has been added to the paper.

      (2) ll. 305-310: This loss of accuracy near the edge is expected on general Bayesian grounds. Any regression approach should learn to estimate the conditional mean of the age distribution given the data, so ages estimated from data will be pulled inward toward the location of most training data. This bias is somewhat mitigated in the Brudner paper by a more flexible model, but it's a general (and expected) feature of the approach.

      (3) While the online AVA documentation looks good, it might benefit from a page on design philosophy that lays out how the various modules fit together - something between the tutorials and the nitty-gritty API. That way, users would be able to get a sense of where they should look if they want to harness pieces of functionality beyond the tutorials.

      Thank you for this suggestion. We will add a page on AVN’s design philosophy to the online documentation. 

      (4) While the manuscript does compare AVN to packages like TweetyNet and AVA that share some functionality, it doesn't really mention what's been going on with the vocalpy ecosystem, where the maintainers have been doing a lot to standardize data processing, integrate tools, etc. I would suggest a few words about how AVN might integrate with these efforts.

      We thank the reviewer for this suggestion.

      (5) ll. 333-336: It would be helpful to provide a citation to some of the self-supervised learning literature this procedure is based on. Some citations are provided in methods, but the general approach is worth citing, in my opinion. 

      We have added a paragraph to the results section with more background on self-supervised learning for dimensionality reduction, particularly in the context of similarity scoring.

      (6) One software concern for medium-term maintenance: AVN docs say to use Python 3.8, and GitHub says the package is 3.9 compatible. I also saw in the toml file that 3.10 and above are not supported. It's worth noting that Python 3.9 reaches its end of life in October 2025, so some dependencies may have to be altered or changed for the package to be viable going forward.  

      Thank you for this comment. We will continue to maintain AVN and update its dependencies as needed.

      Minor points:

      (1) It might be good to note that WhisperSeg is a different install from AVN. May be hard for novice users, though there's a web interface that's available. 

      We’ve added a line to the methods section making this clear. 

      (2) Figure 6b: Some text in the y-axis labels is overlapping here. 

      This has been fixed. Thank you for bringing it to our attention. 

      (3) The name of the Python language is always capitalized.  

      We’ve fixed this capitalization error throughout the manuscript. Thank you.

      Reviewer #3 (Recommendations For The Authors):

      (1) I recommend that the authors improve the motivation of the chosen tasks and data or choose new tasks that more clearly speak to the optimizations they want to perform. 

      We have included more details about the motivation for our LDA classification analysis, age prediction model and embedding model for similarity scoring in the results of the revised manuscript, as discussed in more detail in the above responses to this reviewer. Thank you for these suggestions. 

      (2) They need to rigorously report the (classification) scores on the test datasets: these are the scores associated with the cost function used during training.  

      Based on this reviewer’s ‘Weaknesses: 3’ comment in the public reviews, we believe that they are referring to a classification score for the triplet loss model. As we explained in response to that comment, this is not a classification task, therefor there is no classification score to report. The loss function used to train the model was a triplet loss function. While we could report these values, they are not informative for how well this approach would perform in a similarity scoring context, as explained above. As such, we prefer to include contrast index and tutor contrast index scores to compare the models’ performance for similarity score, as these are directly relevant to the task and are established in the field for said task.

      (3) They need to explain the reasons for the poor performance (or report on the inconsistencies with previous work) and why they prefer a fully automated system rather than one that needs some fine-tuning on bird-specific data.

      We’ve addressed this comment in the public response to this reviewer’s weakness points 3, 5, and 6. 

      (4) They should consider applying their method to data from Japanese and European labs.  

      We’ve addressed this comment in the public response to this reviewer’s weakness point 4.

      (5) The need to document the failure modes and report all details about the human annotations.  

      We’ve added additional description of the failure modes for our segmentation and labeling approaches in the results section of the revised manuscript.

      Details: 

      The introduction is very vague, it fails to make a clear case of what the problem is and what the approach is. It reads a bit like an advertisement for machine learning: we are given a hammer and are looking for a nail.  

      We thank the reviewer for this viewpoint; however, we disagree and have decided to keep our Introduction largely unchanged. 

      L46 That interpretability is needed to maximize the benefits of machine learning is wrong, see self-driving cars and chat GPT.  

      This line states that ‘To truly maximize the benefits of machine learning and deep learning methods for behavior analysis, their power must be balanced with interpretability and generalizability’. We firmly believe that interpretability is critically important when using machine learning tools to gain a deeper scientific understanding of data, including animal behavior data in a neuroscience context. We believe that the introduction and discussion of this paper already provide strong evidence for this claim. 

      L64 What about zebra finches that repeat a syllable in the motif, how are repetitions dealt with by AVN?  

      This is already described in the results section in lines 222-226, and in the methods in the ‘Syntax Features: Repetition Bouts’ section.

      L107 Say a bit more here, what exactly has been annotated?  

      We’ve added a sentence in the introduction to clarify this. Line 113-115. 

      L112 Define spectrogram frames. Do these always fully or sometimes partially contain a vocalization? 

      Spectrogram frames are individual time bins used to compute the spectrogram using a short-term Fourier transform. As described in the ‘Methods; Labeling : UMAP Dimensionality Reduction” section, our spectrograms are computed using ‘The short term Fourier transform of the normalized audio for each syllable […] with a window length of 512 samples and a hop length of 128 samples’. Given that the song files have a standard sampling rate of 44.1kHz, this means each time bin represents 11.6ms of song data, with successive frames advancing in time by 2.9ms. These contain only a small fraction of a vocalization. 

      L122 The reported TweetyNet score of 0.824 is lower than the one reported in Figure 2a.  

      The center line in the box plot in Figure 2a represents the median of the distribution of TweetyNet vmeasure scores. Given that there are a couple outlying birds with very low scores, the mean (0.824 as reported in the text of the results section) is lower than the median. This is not an error.

      L155 Some of the differences in performance are very small, reporting of the P value might be necessary. 

      These methods are unlikely to statistically significantly differ in their validation scores. This doesn’t mean that we cannot use the mean/median values reported to justify favoring one method over another. This is why we’ve chosen not to report p-values here.

      L161 The authors have not really tested more than a single clustering method, failing to show a serious attempt to achieve good performance.  

      We’ve addressed this comment in the public response to this reviewer’s weakness point 2.

      L186 Did isolate birds produce stereotyped syllables that can be clustered? 

      Yes, they did. The validation for clustering of isolate bird songs can be found in Figure 2–figure supplement 4. 

      Fig. 3e: How were the multiple bouts aligned?

      This is described in lines 857-876 in the ‘Methods: Song Timing Features: Rhythm Spectrograms” section of the paper.

      L199 There is a space missing in front of (n=8).  

      Thank you for bringing this to our attention. It’s been corrected in the updated manuscript. 

      L268 Define classification accuracy.  

      We’ve added a sentence in lines 953-954 of the methods section defining classification accuracy. 

      L325 How many motifs need to be identified, why does this need to be done manually? There are semiautomated methods that can allow scaling, these should be  cited here. Also, the mention of bias here should be removed in favor of a more extensive discussion on the experimenter bias (traditionally vs Texas bias (in this paper).  

      All of the methods cited in this line have graphical user interfaces that require users to select a file containing song and manually highlight the start and end each motif to be compared. The exact number of motifs required varies depending on the specific context (e.g. more examples are needed to detect more subtle differences or changes in song similarity) but it is fairly standard for reviewers to score 30 – 100 pairs of motifs. 

      We’ve discussed the tradeoffs between full automation and supervised or human-in-the loop methods in response to this reviewer’s public comment ‘weakness #5 and 6’. Briefly, AVN’s aim is to standardize song analysis, to allow direct comparisons between song features and similarity scores across research groups. We believe, as explained in the paper, that this can be best achieve by having different research groups use the same deep learning models, which perform consistently well across those groups. Introducing semi-automated methods would defeat this benefit of AVN. 

      We’ve also addressed the question of ‘Texas bias’ in response to their reviewer’s public comment ‘Weakness #4’. 

      L340 How is EMD applied? Syllables are points in 8-dim space, but now suddenly authors talk about distributions without explaining how they got from points to distributions. Same in L925.  

      We apologize for the confusion here. The syllable points in the 8-d space are collectively an empirical distribution, not a probability distribution. We referred to them simply as ‘distributions’ to limit technical jargon in the results of the paper, but have changed this to more precise language in the revised manuscript.

      L351 Why do authors now use 'contrast index' to measure performance and no longer 'classification accuracy'?  

      We’ve addressed this comment in the public response to this reviewer’s weakness points 1 and 2.

      Figure 6 What is the confusion matrix, i.e. how well can the model identify pupil-pupil pairings from pupiltutor and from pupil-unrelated pairings? I guess that would amount to something like classification accuracy.  

      There is no model classifying comparisons as pupil-pupil vs. pupil-tutor etc. These comparisons exist only to show the behavior of the similarity scoring approach, which consists of a dissimilarity measure (MMD or EMD) applied to low dimensional representations of syllable generated by the triplet loss model or VAE. This was clarified further in our public response to this reviewer’s weakness points 1 and 2. 

      L487 What are 'song files', and what do they contain?   

      ‘Song files’ are .wav files containing recordings of zebra finch song. They typically contain a single song bout, but they can include multiple song bouts if they are produced close together, or incomplete song bouts if the introductory notes were very soft or the bouts were very long (>30s from the start of the file). Details of these recordings are provided in the ‘Methods: Data Acquisition: UTSW Dataset’ section of the manuscript.

      L497 Calls were only labelled for tweetynet but not for other tasks.  

      That is correct. The rationale for this is provided in the ‘Methods: Manual Song Annotation’ section of the manuscript. 

      L637 There is a contradiction (can something be assigned to the 'own manual annotation category' when the same sentence states that this is done 'without manual annotation'?) 

      We believe there is confusion here between automated annotation and validation. Any bird can be automatically annotated without the need for any existing manual annotations for that individual bird. However, manual labels are required to compare automatically generated annotations against for validation of the method.

      L970 Spectograms of what? (what is the beginning of a song bout, L972). 

      The beginning of a song bout is the first introductory note produced by a bird after a period without vocalizations. This is standard.

    1. Reviewer #2 (Public review):

      Summary:

      Tan et al. examined how multivoxel patterns shift in time windows surrounding event boundaries caused by both prediction errors and prediction uncertainty. They observed that some regions of the brain show earlier pattern shifts than others, followed by periods of increased stability. The authors combine their recent computational model to estimate event boundaries that are based on prediction error vs. uncertainty and use this to examine the moment-to-moment dynamics of pattern changes. I believe this is a meaningful contribution that will be of interest to memory, attention, and complex cognition research.

      Strengths:

      The authors have shown exceptional transparency in terms of sharing their data, code, and stimuli, which is beneficial to the field for future examinations and to the reproduction of findings. The manuscript is well written with clear figures. The study starts from a strong theoretical background to understand how the brain represents events and has used a well-curated set of stimuli. Overall, the authors extend the event segmentation theory beyond prediction error to include prediction uncertainty, which is an important theoretical shift that has implications in episodic memory encoding, the use of semantic and schematic knowledge, and attentional processing.

      Weaknesses:

      The data presented is limited to the cortex, and subcortical contributions would be interesting to explore. Further, the temporal window around event boundaries of 20 seconds is approximately the length of the average event (21.4 seconds), and many of the observed pattern effects occur relatively distal from event boundaries themselves, which makes the link to the theoretical background challenging. Finally, while multivariate pattern shifts were examined at event boundaries related to either prediction error or prediction uncertainty, there was no exploration of univariate activity differences between these two different types of boundaries, which would be valuable.

    1. Google Developer Program Plans & Pricing Choose the plan that's right for you MonthlyYearly More Included in Premium Current plan Standard Enjoy your program benefits at no cost Sign In Join now Learn more Gemini Code Assist for Individuals Gemini CLI (via Gemini Code Assist) 10 Firebase Studio workspaces Gemini in documentation and tools check 35 Monthly Google Cloud Skills Boost credits check Invitations to communities and events check Preferred access to private previews check Engage in Google Developer forums check Showcase your skills check Save interests and bookmark pages check News and announcements Recommended for individuals Current plan Premium Enjoy your enhanced benefits at €22 EUR/month Sign In Join now Upgrade View benefits loyalty Free trial for 1 month Gemini Code Assist Standard Gemini CLI (via Gemini Code Assist) 30 Firebase Studio workspaces 1 month of Google AI Pro $45 GenAI and Cloud monthly credit $500 GenAI and Cloud credit bonus check 1 Google Cloud certification voucher check 100 Monthly Google Cloud Skills Boost credits check Consultation with Google Cloud experts ...and other standard benefits Recommended for teams Enterprise Get exclusive features for your team Preview Learn more Gemini Code Assist Enterprise Gemini CLI (via Gemini Code Assist) check Google Cloud developer sandboxes check $150 Google Cloud monthly credit check Centralized purchasing management check

      GMap

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      This study investigates how collective navigation improvements arise in homing pigeons. Building on the Sasaki & Biro (2017) experiment on homing pigeons, the authors use simulations to test seven candidate social learning strategies of varying cognitive complexity, ranging from simple route averaging to potentially cognitively demanding selective propagation of superior routes. They show that only the simplest strategy-equal route averaging-quantitatively matches the experimental data in both route efficiency and social weighting. More complex strategies, while potentially more effective, fail to align with the observed data. The authors also introduce the concept of "effective group size," showing that the chaining design leads to a strong dilution of earlier individuals' contributions. Overall, they conclude that cognitive simplicity rather than cumulative cultural evolution explains collective route improvements in pigeons.

      Strengths:

      The manuscript addresses an important question and provides a compelling argument that a simpler hypothesis is necessary and sufficient to explain findings of a recent influential study on pigeon route improvements, via a rigorous systematic comparison of seven alternative hypotheses. The authors should be commended for their willingness to critically re-examine established interpretations. The introduction and discussion are broad and link pigeon navigation to general debates on social learning, wisdom of crowds, and CCE.

      We thank the reviewer for their positive comments.

      Weaknesses:

      The lack of availability of codes and data for this manuscript, especially given that it critically examines and proposes alternative hypotheses for an important published work.

      We thank the reviewer for their comment. The code and data for our manuscript are an important aspect of the study, and we had intended to make them publicly available upon publication. The link to our code and data on figshare can be found here: (https://doi.org/10.6084/m9.figshare.28950032.v1). We will further add this link to the Data Availability Statement of our revised version.  

      Reviewer #2 (Public review):

      Summary:

      The manuscript investigates which social navigation mechanisms, with different cognitive demands, can explain experimental data collected from homing pigeons. Interestingly, the results indicate that the simplest strategy - route averaging - aligns best with the experimental data, while the most demanding strategy - selectively propagating the best route - offers no advantage. Further, the results suggest that a mixed strategy of weighted averaging may provide significant improvements.

      The manuscript addresses the important problem of identifying possible mechanisms that could explain observed animal behavior by systematically comparing different candidate models. A core aspect of the study is the calculation of collective routes from individual bird routes using different models that were hypothesized to be employed by the animals, but which differ in their cognitive demands.

      The manuscript is well-written, with high-quality figures supporting both the description of the approach taken and the presentation of results. The results should be of interest to a broad community of researchers investigating (collective) animal behavior, ranging from experiment to theory. The general approach and mathematical methods appear reasonable and show no obvious flaws. The statistical methods also appear.

      Strengths:

      The main strength of the manuscript is the systematic comparison of different meta-mechanisms for social navigation by modeling social trajectories from solitary trajectories and directly comparing them with experimental results on social navigation. The results show that the experimentally observed behavior could, in principle, arise from simple route averaging without the need to identify "knowledgeable" individuals. Another strength of the work is the establishment of a connection between social navigation behavior and the broader literature on the wisdom of crowds through the concept of effective group size.

      We thank the reviewer for their positive comments.

      Weaknesses:

      However, there are two main weaknesses that should be addressed:

      (1) The first concerns the definition of "mechanism" as used by the authors, for example, when writing "navigation mechanism." Intuitively, one might assume that what is meant is a behavioral mechanism in the sense of how behavior is generated as a dynamic process. However, here it is used at a more abstract (meta) level, referring to high-level categories such as "averaging" versus "leader-follower" dynamics. It is not used in the sense of how an individual makes decisions while moving, where the actual route followed in a social context emerges from individuals navigating while simultaneously interacting with conspecifics in space and time. In the presented work, the approach is to directly combine (global) route data of solitary birds according to the considered "meta-mechanisms" to generate social trajectories. Of course, this is not how pigeon social navigation actually works-they do not sit together before the flight and say, "This is my route, this is your route, let's combine them in this way." A mechanistic modeling approach would instead be some form of agent-based model that describes how agents move and interact in space and time. Such a "bottom-up" approach, however, has its drawbacks, including many unknown parameters and often strongly simplifying (implicit) assumptions. I do not expect the authors to conduct agent-based modeling, but at the very least, they should clearly discuss what they mean by "mechanism" and clarify that while their approach has advantages-such as naturally accounting for the statistical features of solitary routes and allowing a direct comparison of different meta-mechanisms is also limited, as it does not address how behavior is actually generated. For example, the approach lacks any explicit modeling of errors, uncertainty, or stochasticity more broadly (e.g., due to environmental influences). Thus, while the presented study yields some interesting results, it can only be considered an intermediate step toward understanding actual behavioral mechanisms.

      We thank the reviewer for their comment and thoughtful suggestions. We agree that the inherent behavioral mechanisms and the biological basis of these mechanisms cannot be determined just through the navigational data alone. For instance, it remains unexplored if pigeons are adapting their behavior based only on social cues from their partners or using other navigational features such as landmarks or roads, location of the sun, geomagnetic cues or prior learnt routes. However, we do agree (as also pointed by the reviewer) that these behavioral rules generate an emergent ‘meta-mechanism’ where the bird pairs are behaving as if their preferred routes are averaged during a flight. It will be important in future work to explore the biological basis of these mechanisms, but our current approach allows us to only describe the mechanisms in a meta sense with any confidence. Considering this, we believe that our analysis is a more top-down approach towards describing the outcomes of these underlying mechanisms in an abstract sense. We would also like to point the reviewer to Dalmaijer, 2024 [1] who used a bottom up approach, using naive agents and showed that cumulative route improvements emerged in the absence of any sophisticated communication in the same dataset, in agreement with our approach. Considering these points, we will make changes in our revised version to clearly elaborate on what the definition of ‘mechanism’ should include in line with the reviewer’s feedback.

      (2) While the presented study raises important questions about the applicability and viability of cumulative cultural evolution (CCE) in explaining certain animal behaviors such as social navigation, I find that it falls short in discussing them. What are the implications regarding the applicability of CCE to animal data and to previously claimed experimental evidence for CCE? Should these experiments be re-analyzed or critically reassessed? If not, why? What are good examples from animal behavior where CCE should not be doubted? Furthermore, what about the cited definitions and criteria of CCE? Are they potentially too restrictive? Should they be revised-and if so, how? Conversely, if the definitions become too general, is CCE still a useful concept for studying certain classes of animal behavior? I think these are some of the very important questions that could be addressed or at least raised in the discussion to initiate a broader debate within the community.

      We thank the reviewer for their comments and interesting questions regarding our study. We agree with the reviewer that our study opens up new avenues for critically analysing the criteria previous studies have used for providing evidence of CCE in non-human animals. According to our literature review, we found that the field has been usually motivated in thinking about CCE in a ‘process’ focused manner (Reindl et al. [2]) in regards to individuals being able to compare strategies and selecting ones resulting in higher individual fitness. This preferential selection of strategies – termed innovations — allows for the stereotypical ratcheting effect seen in CCE. In our study, we propose that in the case of homing pigeons, the ratcheting effect is more of a statistical outcome rather than deliberate individual judgement. We believe that this strategy is also amenable to certain task types (which in our study was homing route choice) and may change for others (for example solving a puzzle box) and the task also needs to be sufficiently complex for animals to benefit from the use of social information (Caldwell et al. 2008 [3]). Thus, we recommend future work to address what classes of problems would fit well within the definition of “emergent” CCE and which ones don’t. Keeping this framework in mind, studies should clearly state what definition of CCE they are using and should be critically evaluated for their underlying task type and cognitive mechanisms to deem them as CCE. Considering these points we will expand our discussion to highlight these key questions that could be critical to think upon for future research.

      References:

      (1) Dalmaijer ES (2024) Cumulative route improvements spontaneously emerge in artificial navigators even in the absence of sophisticated communication or thought. PLoS Biol. 22:e3002644.

      (2) Reindl, E., Gwilliams, A.L., Dean, L.G. et al. (2020) Skills and motivations underlying children’s cumulative cultural learning: case not closed. Palgrave Commun 6, 106.

      (3) Caldwell CA, Millen AE (2008) Studying cumulative cultural evolution in the laboratory. Phil. Trans. R. Soc. B 363:3529-3539.

    1. We may like to ride roller coasters in part because our genetic code has given us a thrill-loving personality and in part because we’ve had some really great times on roller coasters in the past. Still other attitudes are learned via the media (Hargreaves & Tiggemann, 2003; Levina, Waldo, & Fitzgerald, 2000) or through our interactions

      Example 2

    1. Shopify, Mobile SDKs (Flutter, Native iOS/Android, React Native) and Razorpay no-code solutions, such as Payment Links, Payment Pages, Invoices and Payment Buttons.

      Hosted checkouts: Shopify, Mobile SDKs (Flutter, Native iOS, React Native) and Razorpay no-code solutions, such as Payment Links, Payment Pages, Payment handles(razorpay.me), and Invoices.

    2. Watch Out!

      add one more point, File hosting to be done on websites where razorpay checkout(razorpay iframe) is loading as an overlay on your website. Give illustration. This is required or else applepay payments will not work form such websites. This includes woocommerce, mangento and other website platforms.

      However, for the merchant - Shopify, SDK SDK(flutter, native, and react-native), and other razorpay no-code apps(payment links, payment pages, invoices, payment handles etc. will work without any file hosting.

      It is important to note if you have both, you would required to host a file.

    1. . The city became center of a new empire and its most famous ruler, Hammurabi, reigned between 3,792 and 3,750 years ago. Hammurabi is remembered for his law code, another ancient text written about 3,755 years ago. T

      Wow, Babylon was such an important city! Its most famous ruler, Hammurabi, ruled almost 3,800 years ago, and he’s still remembered today for his amazing law code. It’s crazy to think that something written so long ago could shape how people lived and governed back then!

    1. How do you divide out responsibility for a bots actions between the person writing the code and the person running the program?

      To divide the responsibility for a bots actions between the person writing the code and the person running the code you must take intent into account. Anything can be used for good or for evil, what matters if what is the person behind the action intending to do and why do they want to do it. For example, if someone creates a bot to help people find research sources for a project and then someone uses it to find illegal things on the internet, it's the responsibility of the person who ran the program because they chose to abuse the program. However, to protect themselves the person writing the code could also add in preventatives to the code to prevent abusive usage. Yet, the responsibility in the example I provided still falls onto the person who chose to use the bot for 'evil' because they are the person actively seeking the 'evil.'

    2. When one of us ran the program, who made those posts (me? you? the bot?)?

      I think that the post was made as a collective effort from everyone. The person who wrote the code is responsible for creating the framework for which the post was made. I'm responsible for the details of the code, as I put in my login information and made the bot say what I want. And the bot is also partly responsible, as it carried out the actions and was ultimately the reason that a post was made.

    1. In the contemporary era, both print and electronic texts are deeply interpenetrated by code. Digital technologies are now so thoroughly integrated with commercial printing processes that print is more properly considered a particular output form of electronic text than an entirely separate medium. Nevertheless, electronic text remains distinct from print in that it literally cannot be accessed until it is performed by properly executed code. The immediacy of code to the text's performance is fundamental to understanding electronic literature, especially to appreciating its specificity as a literary and technical production. Major genres in the canon of electronic literature emerge not only from different ways in which the user experiences them but also from the structure and specificity of the underlying code.

      Digital literature is not a devaluation, but an evolution of the word. Like printing in the past, it does not destroy the previous form, but enriches it, creating a hybrid space. The key revolution is in rethinking the text itself, which now includes code, interface, and multimedia as its integral parts. This phenomenon is born at the intersection of art and technology, reflecting the diversity of modern digital culture.

    2. The multimodality of digital art works challenges writers, users, and critics to bring together diverse expertise and interpretive traditions to understand fully the aesthetic strategies and possibilities of electronic literature.

      Katherine Hayles considers an electronic literary text as something independent, possessing its own materiality, rather than a new interpretation of a printed book. And this materiality is primarily created by the code that is used to create it. Such literature is a hybrid phenomenon that exists at the intersection of literature, game mechanics, visual art, and programming. And it is precisely this feature that requires the creation of fundamentally new approaches and tools for criticism.

    3. The intermixture of code and language on which recombinant flux depends is situated within a more general set of practices in which human thinking and machine execution collaborate to produce literary works that reference both cognitive modes.

      One of the features of electronic literature that could be used, in my opinion, to give a complete definition of this phenomenon is the tension that exists between languages of an individual and that of a machine. Human language presupposes existence of self-reflection, i.e., metaphysical ability that creates Subject-Object dichotomy. Language, therefore, is a sort of a mirror that is used to display consciousness; individual literally observes their own self by letting the self to express itself. Computer language, however, is a shard of that original reflective ability that is "alive" by the means of a specific algorithm. Algorithm sets borders and rules on how to operate whithin these established borders. Code reflects someone else's ideas rather than its own. Thus, when code and language converge there arise two possibilities: firstly, code can be used to manifest metaphysics; secondly, code can use human language to project its own logic and express it in metaphysical terms, which only pretend to be metaphysical. This second scenario illustrates a different dichotomy: Object-Object. This tension illustrates the intricate nature of electronic literature and its relation to human consciousness. Thus, electronic literary works can be used to eighter awaken the Subject or put it into slumber.

    4. Readers come to digital work with expectations formed by print, including extensive and deep tacit knowledge of letter forms, print conventions, and print literary modes. Of necessity, electronic literature must build on these expectations even as it modifies and transforms them. At the same time, because electronic literature is normally created and performed within a context of networked and programmable media, it is also informed by the powerhouses of contemporary culture, particularly computer games, films, animations, digital arts, graphic design, and electronic visual culture. In this sense electronic literature is a "hopeful monster"

      With the rapid spread of technology in our daily lives through various gadgets, literature as a significant source of information has inevitably moved into the digital realm. Katherine Hayles accurately noticed that new forms of "modern culture" necessitated new forms of text, for example, character's lines in a computer game. As stated in the definition, literature has abandoned its printed form in favor of a computer-based code shell. The author also covered the question of progress in genres of electronic literature bounded to the progress in technologies itself. It is illustrated with an expansion of hypertext fiction forms and deeper immersion to interactive fiction. In general, we can conclude that electronic literature is a natural successor of the printed literature.

    1. 以下のサンプルコードを用意しました

      MUST: コードが消えている。以下の部分でpythonのあとの改行が削除されている

      ```{code-block} python :caption: レストランを例にしたタスクのつながりのあるサンプル

      以降も同様

    1. In the code editor below, revise the query to select the last_statement column in addition to the existing columns.Once you're done, you can hit Shift+Enter to run the query.

      Select the columns first name and last name from the table with a limit of 3 rows.

    1. des exercices

      Il y a une erreur dans cette phrase, non ? Pour démarrer l'exercice, il y a des exercices qui nous aideront à mettre en place notre code ?

    1. à partir du milieu des 00:45:24 années 1990 la tendance inverse avec un durcissement de la législation chaque fois qu'une majorité de droite revient au pouvoir et une correction seulement partielle lorsque c'est la gauche qui 00:45:35 gouverne ainsi en 1994 on institue la rétention judiciaire autrement dit la garde à vue pour les moins de 13 ans en 1996 on permet la comparution immédiate et la comparution devant le juge des 00:45:49 enfants sans instruction préalable en 2002 on crée les centres éducatifs fermés ainsi que les établissements pénitentiaires pour mineurs et on abaisse l'âge de la responsabilité pénale de 13 à 10 ans 00:46:01 autorisant des sanctions beaucoup plus tôt dans la vie le code de la justice pénale des mineurs rétablira en fait en 2021 la limite de 13 ans en 2007 les exception 00:46:13 permettant de ne pas appliquer l'excuse de minorité pour les les mineurs de plus de 16 ans sont élargies ces dispositions seront toutefois abreugé en 2014 la pleine excuse de minorité se trouvant 00:46:25 alors rétablie en 2007 encore on supprime l'atténuation de la peine pour les mineurs de 16 ans en cas deuxèe récidif s'il commett un délit avec violence ou agression sexuelle en 00:46:38 2011 les tribunaux correctionnels pour mineurs sont créés pour juger les délits punis de plus de 3 ans d'emprisonnement en récidive par des adolescents de plus de 16 ans ils seront 00:46:50 cependant supprimé en 2016 en 2019 on permet d'appliquer au mineurs de plus de 13 la détention à domicile sous surveillance électronique progressivement ainsi avec 00:47:02 ces balancements que je vous ai indiqué le législateur érode le principe de protection de l'ordonnance de 1945 restreint les effets de la 00:47:14 présomption de non discernement et de l'excuse de minorité multiplie les lieux d'enfermement et les possibilités de peine correspondantes et rapproche la justice pénale des mineurs de la justice 00:47:27 pénale des adultes et vous aurez certainement remarqué que c'est un débat qui aujourd'hui est à nouveau sur la table
    2. les écoles n'échappent donc pas totalement au moment punitif en réalité plutôt que de se demander si la discipline y est plus sévère ou moins sévère que par le passé il faudrait s'interroger sur la manière dont la 00:35:28 discipline s'y reconfigure en permanence l'interdiction des châtiments corporels autrefois prévalent peut-être ainsi concomitante de l'apparition de nouveaux motifs de sanction au titre notamment du principe de laïcité tel que défini dans 00:35:41 la loi du 15 mars 2004 ainsi pour l'année scolaire 2022-2023 ce sont 3881 signalements qui ont été transmis au ministère de l'Éducation nationale dont environ la moitié concerne je cite des tenues qui 00:35:54 ne manifestent pas par nature une appartenance religieuse comme des jupes ou des robes longues selon les termes des bilans qui sont effectués par les dites équipe académique valeur de la 00:36:05 République ou eavr et leurs 1200 formateurs on ignore le nombre de sanctions correspondantes qui peuvent être disciplinaire au sein de l'école et même pénal dans le cadre du code de l'éducation qui prévoit une amende de 00:36:19 150 € portée à 200 en cas de récidive
    1. L'École et les Valeurs de la République : Synthèse de la Journée d'Études

      Résumé

      Ce document de synthèse analyse les thèmes, arguments et données clés présentés lors de la journée d'études intitulée "L'École, un territoire vivant au cœur des valeurs de la République".

      Organisée par l'INSPÉ de l'Académie de Lille, cette journée s'est déroulée dans un contexte marqué par l'attentat d'Arras, conférant une acuité particulière aux débats.

      Les interventions soulignent unanimement la mission première de l'École, inscrite dans le Code de l'éducation, de faire partager les valeurs de la République.

      Cette mission s'ancre dans un héritage historique profond, allant des Lumières aux lois Jules Ferry, et vise à former des citoyens émancipés par la raison et le savoir.

      Une analyse sémantique et juridique révèle que la notion de "valeurs de la République" est d'usage récent, tant dans le discours public que dans les textes de loi, avec une augmentation significative depuis les années 1980.

      Ces valeurs ne sont pas figées ; elles évoluent et s'enrichissent, comme en témoigne l'intégration de la lutte contre les discriminations.

      Le droit ne leur donne pas de définition constitutionnelle, et leur mention prédomine dans le Code de l'éducation et le droit des étrangers.

      Sur le plan pédagogique, un consensus émerge sur la nécessité de dépasser une "pédagogie de la prescription" pour atteindre une "pédagogie de la conviction".

      Cette "approche citoyenne" refuse l'inculcation et promeut la pensée critique, l'expérimentation des valeurs au quotidien et la coopération.

      L'objectif est de permettre aux élèves non seulement de connaître les valeurs, mais de les "éprouver" et d'en ressentir le bénéfice, transformant l'école en un "écosystème de valeurs".

      Enfin, les discussions mettent en lumière les défis contemporains : le poids croissant qui pèse sur l'institution scolaire, le communautarisme, le relativisme et la nécessité de ne pas nier le réel tout en présentant les valeurs comme un idéal à conquérir.

      L'écart entre la valeur et le réel est présenté non comme un échec, mais comme l'espace même de l'engagement citoyen.

      1. La Mission Fondamentale de l'École dans un Contexte de Crise

      Les propos introductifs des différents intervenants ont unanimement rappelé le rôle central et fondateur de l'École dans la transmission des valeurs républicaines, une mission rendue encore plus cruciale par le contexte contemporain.

      1.1 Un Fondement Juridique et Historique

      La mission de l'École est clairement définie par l'article L111-1 du Code de l'éducation, cité à plusieurs reprises, qui stipule que "la nation fixe comme mission première à l’école de faire partager aux élèves les valeurs de la République".

      Cette mission n'est pas un simple "supplément d'âme" mais une obligation professionnelle qui constitue l'armature du projet républicain.

      Les intervenants ont inscrit cette mission dans une profondeur historique :

      Les Lumières et la Révolution : Alain Frugère a évoqué l'esprit des Lumières (Molière), le projet d'instruction publique de Condorcet (1792) qui établit la primauté des savoirs issus de la recherche sur les opinions et les croyances, et le "pari de la raison émancipatrice".

      Le 19ème siècle : Madame Looher a rappelé le projet des républicains de la Troisième République (Gambetta, Ferdinand Buisson) de stabiliser le régime grâce à l'éducation, aboutissant aux lois Jules Ferry de 1881-82 qui instaurent un enseignement fondé sur la gratuité, l'obligation et la laïcité.

      1.2 Le Poids du Contexte Actuel

      La journée d'études, bien que planifiée de longue date, a été profondément marquée par l'assassinat de Dominique Bernard à Arras.

      Cet événement a donné une "coloration tout à fait particulière" aux réflexions, comme l'a souligné Sébastien Jaibovski.

      Ce contexte met en lumière plusieurs tensions :

      Le Poids sur l'Institution : Sébastien Jaibovski a soulevé la question du "poids qui aujourd'hui est très important, peut-être trop important" que la société fait peser sur l'École et ses enseignants.

      La Conquête Permanente : Il a également insisté sur le fait que "les valeurs ne sont jamais acquises mais elles sont toujours à être conquises et à conquérir".

      Les Défis Sociétaux : Alain Frugère a mentionné "le repli sur soi, le communautarisme, l'intolérance voire la haine" comme des défis quotidiens, tandis que Mathieu Clouet a listé les inégalités sociales, les effets de l'économie médiatique et le relativisme.

      2. Analyse de la Notion de "Valeurs de la République"

      L'intervention d'Ismaël Ferrat, professeur des universités, a offert une analyse lexicale et juridique détaillée, démontrant que la notion de "valeurs de la République" est à la fois complexe, évolutive et d'émergence récente.

      2.1 Une Apparition Récente dans le Discours Public et Juridique

      Contrairement à une idée reçue, l'usage du syntagme "valeurs de la République" est un phénomène récent.

      Dans les publications : Une analyse des corpus de textes numérisés (Google Books) et des archives du journal Le Monde montre une quasi-absence du terme jusqu'aux années 1980, suivie d'une "explosion" de son usage à partir de 1989.

      • Dans le droit : L'occurrence du terme dans les codes juridiques français est très faible au début des années 2000 et connaît une forte poussée à partir de 2016.

      Cette augmentation est principalement due à deux codes :

        1. Le Code de l'éducation.
        1. Le Code de l'entrée et du séjour des étrangers et du droit d'asile.

      2.2 Une Définition Juridique Absente et Évolutive

      L'analyse juridique révèle un paradoxe : bien que la notion soit de plus en plus utilisée, elle reste juridiquement insaisissable.

      Absence de définition constitutionnelle : Aucun texte constitutionnel ne définit précisément ce que sont les valeurs de la République. Le Conseil constitutionnel n'a produit aucune étude sur le sujet.

      L'avis du Conseil d'État : Saisi lors du projet de loi "séparatisme", le Conseil d'État a jugé la notion de "valeurs" trop large pour être un principe de droit généralisable, lui préférant celle de "principes républicains".

      Des valeurs évolutives : La liste des valeurs n'est pas figée. La lutte contre les discriminations, par exemple, est une valeur aujourd'hui considérée comme une évidence, alors que le premier article du Code pénal sur ce sujet ne date que de 1994.

      3. L'Approche Pédagogique : De la Prescription à la Conviction

      Mathieu Clouet, représentant l'équipe académique Valeurs de la République, a développé le concept d'une "approche citoyenne des valeurs à l'école", qui se distingue par son refus de l'inculcation au profit d'une adhésion réfléchie.

      3.1 Refuser l'Inculcation, Viser la Conviction

      L'objectif n'est pas seulement de faire connaître les valeurs, mais de les "faire partager".

      Pédagogie de la conviction : "Nous ne pouvons pas nous contenter d'une pédagogie de la prescription, il nous faut trouver la voix d'une pédagogie de la conviction."

      Appel à la raison : Cette approche repose sur l'éducation à la liberté, fait appel à la pensée critique et apprend aux élèves à interroger les valeurs elles-mêmes.

      Les trois dimensions de la valeur : Elle doit prendre en compte les dimensions

      • intellectuelle (contenus),

      • psycho-affective (ressenti) et

      • conative (action).

      3.2 L'École comme "Écosystème de Valeurs"

      Pour que les valeurs aient du prix aux yeux des élèves, ils doivent les "éprouver", c'est-à-dire en ressentir le bénéfice et en tester la réalité.

      Le rôle des savoirs : La transmission des connaissances participe à l'éducation aux valeurs. Citant Catherine Kintzler, Mathieu Clouet parle de la "puissance libératrice des enseignements" : maîtriser un savoir est une expérience concrète de la liberté.

      L'expérience vécue : L'éducation aux valeurs passe aussi par la coopération, la prise de responsabilité et les pratiques participatives. L'école doit être un lieu où les valeurs sont incarnées au quotidien pour éviter les écarts entre le discours et la réalité.

      Inverser la focale : Il est suggéré de replacer les faits négatifs (discriminations, racisme) dans la perspective plus large de la lutte pour l'égalité.

      L'exemple de l'affaire Dreyfus est utilisé pour montrer que la France de l'époque n'était pas seulement celle de l'antisémitisme, mais aussi le seul pays d'Europe où des intellectuels se sont levés pour défendre un Juif.

      4. Étude de Cas : l'Enseignement de la Laïcité

      Ismaël Ferrat a illustré les enjeux de la transmission des valeurs à travers l'exemple de la laïcité, en analysant son traitement dans les programmes scolaires.

      Période

      Occurrence du mot "Laïcité" dans les programmes (élémentaire/collège)

      Contexte et Enjeux

      Années 1970-1980

      Quasiment absente La laïcité est considérée comme une évidence, une "non-notion" sur le plan pédagogique.

      Années 1990-2000

      Forte augmentation

      L'émergence est liée à la nécessité d'expliquer les règles, notamment suite à l'affaire du voile de Creil (1989) et la circulaire Bayrou (1994) sur les signes religieux ostensibles.

      Depuis 2013 (Loi Peillon) Présence stabilisée à un niveau élevé

      Un élève scolarisé aujourd'hui rencontre la notion environ 13 fois entre le primaire et le collège.

      L'enjeu pédagogique est double :

        1. Expliquer le principe : Donner les clés de compréhension d'une valeur fondamentale.
        1. "Déconflictualiser" : Éviter que le principe soit perçu par certains élèves, notamment de culture musulmane, comme étant dirigé "contre l'islam".

      Les résultats sont probants : une étude du Knesco montre que 90 % des élèves de 3e et 80 % des lycéens en terminale ont déjà abordé la laïcité en cours et maîtrisent globalement bien la notion. Cela démontre l'efficacité du travail mené en classe.

      5. Conclusion : La Valeur comme Engagement et "Refus du Réel" La journée d'études se conclut sur une vision exigeante mais volontariste de la mission de l'École.

      La transmission des valeurs de la République n'est pas l'imposition d'un dogme, mais une invitation à participer à un projet collectif de "perpétuelle réinvention démocratique".

      Comme l'a formulé Mathieu Clouet, il faut se souvenir qu'"une valeur ça n'est pas seulement un reflet du réel, une valeur c'est aussi un refus du réel".

      L'écart entre l'idéal prôné par la valeur (l'égalité, la fraternité) et les imperfections de la société n'est pas un signe d'échec.

      Au contraire, "c'est précisément dans cet écart que nous pouvons trouver les moyens d'apporter aux élèves que nous encadrons la volonté d'agir et de s'engager dans la République française".

      L'approche citoyenne des valeurs est donc, en définitive, une preuve de l'engagement citoyen de l'ensemble de la communauté éducative.

    1. Organizes code into classes and objectsSupports encapsulation to group data and methods togetherEnables inheritance for reusability and hierarchyAllows polymorphism for flexible method implementationImproves modularity, scalability, and maintainability

      test

    1. After some time, I also realized that if design was problem solving, then we all design to some degree.

      The way that I see it is similar to this observation. I feel like design is how a person uses their creativity to solve problems. It's like a bridge between technical problem solving and creative thinking. As someone that comes from a computer science background, working with front-end code for the first time was enlightening because it felt like I was coding and doing an art project at the same time.

    1. Reviewer #1 (Public review):

      Summary

      The manuscript by K.H. Lee et al. presents Spyglass, a new open-source framework for building reproducible pipelines in systems neuroscience. The framework integrates the NWB (Neurodata Without Borders) data standard with the DataJoint relational database system to organize and manage analysis workflows. It enables the construction of complete pipelines, from raw data acquisition to final figures. The authors demonstrate their capabilities through examples, including spike sorting, LFP filtering, and sharp-wave ripple (SWR) detection. Additionally, the framework supports interactive visualizations via integration with Figurl, a platform for sharing neuroscience figures online.

      Strengths:

      Reproducibility in data analysis remains a significant challenge within the neuroscience community, posing a barrier to scientific progress. While many journals now require authors to share their data and code upon publication, this alone does not ensure that the code will execute properly or reproduce the original results. Recognizing this gap, the authors aim to address the community's need for a robust tool to build reproducible pipelines in systems neuroscience.

      Weaknesses:

      The issues identified here may serve as a foundation for future development efforts.

      (1) User-friendliness:

      The primary concern is usability. The manuscript does not clearly define the intended user base within a modern systems neuroscience lab. Improving user experience and lowering the barrier to entry would significantly enhance the framework's potential for broad adoption. The authors provide an online example notebook and a local setup notebook. However, the local setup process is overly complex, with many restrictive steps that could discourage new users. A more streamlined and clearly documented onboarding process is essential. Additionally, the lack of Windows support represents a practical limitation, particularly if the goal is widespread adoption across diverse research environments.

      (2) Dependency management and long-term sustainability:

      The framework depends on numerous external libraries and tools for data processing. This raises concerns about long-term maintainability, especially given the short lifespan of many academic software projects and the instability often associated with Python's backward compatibility. It would be helpful for the authors to clarify how flexible and modular the pipeline is, and whether it can remain functional if upstream dependencies become deprecated or change substantially.

      (3) Extensibility for custom pipelines:

      A further limitation is the insufficient documentation regarding the creation of custom pipelines. It is unclear how a user could adapt Spyglass to implement their own analysis workflows, especially if these differ from the provided examples (e.g., spike sorting, LFP analysis that are very specific to the hippocampal field). A clearer explanation or example of how to extend the framework for unrelated or novel analyses would greatly improve its utility and encourage community contributions.

      (4) Flexibility vs. Standardization:

      The authors may benefit from more explicitly defining the intended role of the framework: is Spyglass designed as a flexible, general-purpose tool for developing custom data analysis pipelines, or is its primary goal to provide a standardized framework for freezing and preserving pipelines post-publication to ensure reproducibility? While both goals are valuable, attempting to fully support both may introduce unnecessary complexity and result in a tool that is not well-suited for either purpose. The manuscript briefly touches on this tradeoff in the introduction, and the latter-pipeline preservation-may be the more natural fit for the package. If so, this intended use should be clearly communicated in the documentation to help users understand its scope and strengths.

      Impact:

      This work represents a significant milestone in advancing reproducible data analysis pipelines in neuroscience. Beyond reproducibility, the integration of cloud-based execution and shareable, interactive figures has the potential to transform how scientific collaboration and data dissemination are conducted. The authors are at the forefront of this shift, contributing valuable tools that push the field toward more transparent and accessible research practices.

    1. Reviewer #3 (Public review):

      Bogdan et al. present an intriguing and timely investigation into the intrinsic dynamics of prediction error (PE)-related brain states. The manuscript is grounded in an intuitive and compelling theoretical idea: that the brain alternates between high and low PE states even at rest, potentially reflecting an intrinsic drive toward predictive minimization. The authors employ a creative analytic framework combining different prediction tasks and imaging modalities. They shared open code, which will be valuable for future work.

      However, the current manuscript would benefit from further clarification and empirical grounding, especially with regard to its theoretical framing (that PE-like state fluctuations are intrinsic and help us minimize PE), interpretation of results, and broader functional significance. Below, I outline a few major comments and suggestions that I think would strengthen the contribution.

      (1) Consistency in Theoretical Framing

      The title, abstract, and introduction suggest inconsistent theoretical goals of the study.

      The title suggests that the goal is to test whether there are intrinsic fluctuations in high and low PE states at rest. The abstract and introduction suggest that the goal is to test whether the brain intrinsically minimizes PE and whether this minimization recruits global brain networks. My comments here are that a) these are fundamentally different claims, and b) both are challenging to falsify. For one, task-like recurrence of PE states during resting might reflect the wiring and geometry of the functional organization of the brain emerging from neurobiological constraints or developmental processes (e.g., experience), but showing that mirroring exists because of the need to minimize PE requires establishing a robust relationship with behavior or showing a causal effect (e.g., that interrupting intrinsic PE state fluctuations affects prediction).

      The global PE hypothesis-"PE minimization is a principle that broadly coordinates brain functions of all sorts, including abstract cognitive functions"-is more suitable for discussion rather than the main claim in the abstract, introduction, and all throughout the paper.

      Given the above, I recommend that the authors clarify and align their core theoretical goals across the title, abstract, introduction, and results. If the focus is on identifying fluctuations that resemble task-defined PE states at rest, the language should reflect that more narrowly, and save broader claims about global PE minimization for the discussion. This hypothesis also needs to be contextualized within prior work. I'd like to see if there is similar evidence in the literature using animal models.

      (2) Interpretation of PE-Related Fluctuations at Rest and Its Functional Relevance

      It would strengthen the paper to clarify what is meant by "intrinsic" state fluctuations. Intrinsic might mean task-independent, trait-like, or spontaneously generated. Which do the authors mean here? Is the key prediction that these fluctuations will persist in the absence of a prediction task?

      Regardless of the intrinsic argument, I find it challenging to interpret the results as evidence of PE fluctuations at rest. What the authors show directly is that the degree to which a subset of regions within a PE network discriminates high vs. low PE during task correlates with the magnitude of separation between high and low PE states during rest. While this is an interesting relationship, it does not establish that the resting-state brain spontaneously alternates between high and low PE states, nor that it does so in a functionally meaningful way that is related to behavior. How can we rule out brain dynamics of other processes, such as arousal, that also rise and fall with PE? I understand the authors' intention to address the reverse inference concern by testing whether "a participant's unique connectivity response to PE in the reward-processing task should match their specific patterns of resting-state fluctuation". However, I'm not fully convinced that this analysis establishes the functional role of the identified modules to PE because of the following:

      Theoretically, relating the activities of the identified modules directly to behavior would demonstrate a stronger functional role.

      a) Across participants: Do individuals who exhibit stronger or more distinct PE-related fluctuations at rest also perform better on tasks that require prediction or inference? This could be assessed using the HCP prediction task, though if individual variability is limited (e.g., due to ceiling effects), I would suggest exploring a dataset with a prediction task that has greater behavioral variance.

      Or even more broadly, does this variability in resting state PE state fluctuations predict general cognitive abilities like WM and attention (which the HCP dataset also provides)? I appreciate the inclusion of the win-loss control, and I can see the intention to address specificity. This would test whether PE state fluctuations reflect something about general cognition, but also above and beyond these attentional or WM processes that we know are fluctuating.

      b) Within participants: Do momentary increases in PE-network expression during tasks relate to better or faster prediction? In other words, is there evidence that stronger expression of PE-related states is associated with better behavioral outcomes?

      (3) Apriori Hypothesis for EEG Frequency Analysis

      It's unclear how to interpret the finding that fMRI fluctuations in the defined modules correlate with frontal Delta/Theta power, specifically in the 3-6 Hz range. However, in the EEG literature, this frequency band is most commonly associated with low arousal, drowsiness, and mind wandering in resting, awake adults, not uniquely with prediction error processing. An a priori hypothesis is lacking here: what specific frequency band would we expect to track spontaneous PE signals at rest, and why? Without this, it is difficult to separate a PE-based interpretation from more general arousal or vigilance fluctuations.

      (4) Significance Assessment

      The significance of the correlation above and all other correlation analyses should be assessed through a permutation test rather than a single parametric t-test against zero. There are a few reasons: a) EEG and fMRI time series are autocorrelated, violating the independence assumption of parametric tests;<br /> b) Standard t-tests can underestimate the true null distribution's variance, because EEG-fMRI correlations often involve shared slow drifts or noise sources, which can yield spurious correlations and inflating false positives unless tested against an appropriate null.

      Building a null distribution that preserves the slow drifts, for example, would help us understand how likely it is for the two time series to be correlated when the slow drifts are still present, and how much better the current correlation is, compared to this more conservative null. You can perform this by phase randomizing one of the two time courses N times (e.g., N=1000), which maintains the autocorrelation structure while breaking any true co-occurrence in patterns between the two time series, and compute a non-parametric p-value. I suggest using this approach in all correlation analyses between two time series.

      (5) Analysis choices

      If I'm understanding correctly, the algorithm used to identify modules does so by assigning nodes to communities, but it does not itself restrict what edges can be formed from these modules. This makes me wonder whether the decision to focus only on connections between adjacent modules, rather than considering the full connectivity, was an analytic choice by the authors. If so, could you clarify the rationale? In particular, what justifies assuming that the gradient of PE states should be captured by edges formed only between nearby modules (as shown in Figure 2E and Figure 4), rather than by the full connectivity matrix? If this restriction is instead a by-product of the algorithm, please explain why this outcome is appropriate for detecting a global signature of PE states in both task and rest.

      When assessing the correspondence across task-fMRI and rs-fMRI in section 2.2.2, why was the pattern during task calculated from selecting a pair of bilateral ROIs (resulting in a group of eight ROIs), and the resting state pattern calculated from posterior-anterior/ventral-dorsal fluctuation modules? Doesn't it make more sense to align the two measures? For example, calculating task effects on these same modules during task and rest?

    1. "You should have seen the crap quality of that farmer's welding."

      This is one of my favorite parts just because of how dismissive these corporations are to prior art. I've seen this pattern repeatedly in software development where solutions are dismissed because the quality of the code is seen as inferior.

    1. And because our (digital) prototypes try to be used/validaded mainly by communities instead of by academic peers, we need to care about the practicalities of such prototypes and their insertion in the communities. In my experience, this practical insertion could happen via two complementary strategies: the encompassing one and embedding one. The encompassing strategy could be exemplified by the Smalltalk variants, like Pharo or GToolkit, with their OS and IDE rolled into one approach. Here, a single computing experience includes "everything" a community artifact could need: object networks acting as "app(s)"3, persistance, data formats, IDEs, graphical stack, debbugers and so on. The practicalities are related with the collapse of incidental complexity when the community has a single metatool to bridge their other tools and workflows. We use what I call "interstitial programming" to bridge socio-technical systems by changing what happens in the gaps/bridges between them, instead of changing them from inside. This was the approach I followed with Grafoscopio, since late 2014 and early 2015 until present day, with pretty good results and fluency, allowing us to make several prototypes and empowering practices convering diverse needs: from self (PDF/web) publishing, to civic tech and political oversight, community learning and memory, amont other themes (chosing needs and topics in resonance with the community is key in having this prototypes as living artifacts in such community). The embedding strategy could be exemplified by Lua and its variants, like YueScript. Here, an already existing tool/experience is extended from inside or by complementing and then replacing an existing tool/practice, and while this contrast the "interstitial" approach mentioned above, still shares the concern of dealing with needs felt in the community in its current workflows and tools. This is the strategy I plan to explore this year, particularly regarding the publishing workflows/formats of several local grassroots communities, and to compare with how I'll be implementing part of such ideas in Grafoscopio (keeping on with the encompassing strategy). While previously I thought in Fengari as my way to implement embeddability to increse agency in the (web) tools, the recent developments on hypermedia systems make me think that I can keep avoiding JavaScript4 and implement the strategy server side by reimagining TiddlyWiki in Lua+YueScript. Cardumem is the working name for such idea, and as explained in that link the intend is to provide a similar gentle learning curve between being a content creator and a functionality creator, that TiddlyWiki give us, while being able to generalize the concepts learnt while using and extending the wiki in its own functional DSL to other computing languages (for more details and links to the TW's community discussion visit the previos link). So, regarding the "Not Invented Here syndrome", the differences with TiddlyWiki are enough to justify why we need to invest all that work in Cardumem, as community and (inter)personal knowledge management is a core concern5 in the Grafoscopio community, to the point that we need to reinvent the wheel, for the contexts where the already existing ones don't work as we expect for our needs. While learning Lua and YueScript, I frequently miss a lot of the code liveness and the interactive documentation of the "Argumentative Driven Development" (ADD? 🤔) that I already enjoy within Grafoscopio over Pharo/GToolkit. So I thought that my first job would be to implement some kind of minimal notebook publishing on Lua, inpired by Clojure's Clerk6 and Julia's Pluto, but quite more static, at least as the begining (see Boostrapping a Lua notebook for more details). But finally a minimal Lua long comment + "markup tag" was good enough to have my documentation in the Lua files to postpone the idea, while exploring the HTML interactive interfaces provided by HTMX. Instead the design has been guided by the needs I have with my students/apprentices in my classes this semester at the university and future workshops in the hackerspace. And it has been a pretty fruitful design space/practice, where UI and functionality emerge organically, with the lessons I need to learn to ptovide the experience I need/want. There is still a long path to walk, but the initial advances are promising. Let's see how I walk the exploration map sketched here in this pendular movement from emcompassing to embedding strategies and from abstraction about the to concrete implementations. I will document my advances in the entries to come.

      La tecnología pensada para comunidades debe práctica y no solo teórica, y para lograrlo se pueden usar dos estrategias: la envolvente, que ofrece una herramienta integral como Grafoscopio, o la incrustada, que mejora las herramientas que la gente ya utiliza, como se muestra con Cardumem. La idea es encontrar que entre estas dos formas se alinee para que la tecnología llegue a las necesidades reales de una comunidad y no solo el entorno académico u operativo de la programación.

    2. When I zoomed out from our practices on critical code/data literacy using metatools and pocket infrastructures to reformulate them in the broader convivial computing, one of the emphasis was how to increase (inter)personal and community agency with artifacts and practices that where deeply rooted and concerned with specific communities in particular contexts, despite the generalizing possibilitities of the concepts and recontextualizing possibilities of the practices and artifacts. This is something that I found kind of abstracted in the Global North counterparts of this genealogy, in things like "the developers" or "the user", in contrast with The People of the Center in the Colombian Amazonas, the local hackerspace, a food soveraignity and solidarity savings collective in the Colombian coffe region.

      En algunos lugares la gente habla de los desarrolladores y los que programan, pero aquí, en Colombia, la tecnología se hace pensando en algo más aterrizado que es parte de la cultura colombiana como los pueblos indígenas o campesinos, es así que los sistemas se vuelven como un medio o herramienta que no está tan suelta sino que puede ser parte de las personas.

    1. standard compiler: takes a whole computer program and turn it all into binary so it can be run later

      I remember my cs professor's joke: we code to compile another compiler to compile another code. Essentially, this is a chicken and egg problem and I remember being so confused in my class.

    1. Reviewer #2 (Public review):

      Summary:

      This study establishes a platform for studying mosquito flight activity over the course of several weeks and demonstrates key applications of such a paradigm: the comparison of daily activity profiles across different Aedes aegypti populations and the quantification of responses to physiological and environmental perturbations.

      Strengths:

      (1) Overall, the authors succeed in setting up a low-cost, scalable tracking system that stably records mosquito flight activity for several weeks and uses it to demonstrate compelling use cases.

      (2) The text is organized well, is easy to read, and is understandable for a broad audience.

      (3) Instructions for constructing housing and for performing tracking with a dedicated GUI are available on an accompanying website, with open-source (and well-organized) code.

      (4) A complementary pair of methods (one testing for activity signals at specific times of the day, and the other capturing broader daily patterns) is used effectively.

      Weaknesses:

      (1) In the interval-based GLMM results, since each time interval is tested independently, p-values should be corrected for multiple hypotheses (for instance, through controlling the false discovery rate).

      (2) The accompanying GUI application needs some modifications to fully work out of the box on a sample video.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the editor and the reviewers for their positive and constructive comments. Below is our point-by-point responses.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Metabolic dysfunction-associated steatotic liver disease (MASLD) ranges from simple steatosis, steatohepatitis, fibrosis/cirrhosis, and hepatocellular carcinoma. In the current study, the authors aimed to determine the early molecular signatures differentiating patients with MASLD associated fibrosis from those patients with early MASLD but no symptoms. The authors recruited 109 obese individuals before bariatric surgery. They separated the cohorts as no MASLD (without histological abnormalities) and MASLD. The liver samples were then subjected to transcriptomic and metabolomic analysis. The serum samples were subjected to metabolomic analysis. The authors identified dysregulated lipid metabolism, including glyceride lipids, in the liver samples of MASLD patients compared to the no MASLD ones. Circulating metabolomic changes in lipid profiles slightly correlated with MASLD, possibly due to the no MASLD samples derived from obese patients. Several genes involved in lipid droplet formation were also found elevated in MASLD patients. Besides, elevated levels of amino acids, which are possibly related to collagen synthesis, were observed in MASLD patients. Several antioxidant metabolites were increased in MASLD patients. Furthermore, dysregulated genes involved in mitochondrial function and autophagy were identified in MASLD patients, likely linking oxidative stress to MASLD progression. The authors then determined the representative gene signatures in the development of fibrosis by comparing this cohort with the other two published cohorts. Top enriched pathways in fibrotic patients included GTPase signaling and innate immune responses, suggesting the involvement of GTPase in MASLD progression to fibrosis. The authors then challenged human patient derived 3D spheroid system with a dual PPARa/d agonist and found that this treatment restored the expression levels of GTPase-related genes in MASLD 3D spheroids. In conclusion, the authors suggested the involvement of upregulated GTPase-related genes during fibrosis initiation. Overall, the current study might provide some resources regarding transcriptomic and metabolomic data derived from obese patients with and without MASLD. However, several concerns should be carefully addressed.

      1. A recent study, via proteomic and transcriptomic analysis, revealed that four proteins (ADAMTSL2, AKR1B10, CFHR4 and TREM2) could be used to identify MASLD patients at risk of steatohepatitis (PMID: 37037945). It is not clear why the authors did not include this study in their comparison. Thank you for the suggestion. The RNA sequencing dataset (GSE135251) from study PMID 37037945 is the same dataset we used as an external benchmark in our study, referred to as the EU cohort on page 4 in the manuscript. In addition to PMID 37037945, we have cited the original transcriptomic study (PMID 33268509) for the EU cohort. In the revised manuscript, we discussed this proteome-transcriptome paper in the Discussion section and highlighted the potential of AKR1B10 as a biomarker in early MASLD.

      The authors recruited 109 patients but only performed transcriptomic and metabolomic analysis in 94 liver samples. Why did the authors exclude other samples?

      We thank the reviewer for their question and we understand the confusion. The discrepancy in sample size between liver and plasma cohorts is due to the fact that, for certain cases, we were unable to get sufficient liver tissue slices (“Exclusion criteria included: age The authors mentioned clinical data in Table 1 but did not present the table in this manuscript.

      Table 1 (key patient characteristics) was included in the main document after the Methods section, and Table S1 (additional patient characteristics) was provided as a supplemental file in our original submission.

      The generated metabolomic data could be a very useful resource to the MASLD community. However, it is very confusing how the data was generated in those supplemental tables. There is no clear labeling of human clinical information in those tables. Also, what do those values mean in columns 47-154? This reviewer assumed that they are the raw data of metabolomic analysis in plasma samples. However, without clear clinical information in these patients, it is impossible that any scientist can use the data to reproduce the authors' findings.

      We appreciate this suggestion. To ensure accessibility of the data resources, we created a GitHub repository for both data and code, available at https://github.com/SLINGhub/MASLD_dual_omics____.

      The GitHub repository includes clinical data for all 109 participants with patient characteristics and histological gradings, as well as processed omics data (log₂-transformed). We have generated artificial IDs for each patient so that we can include all the requested data in an organized manner. A code template is also provided to replicate the main statistical results from this study. In addition, for readers interested in conducting analyses from the raw data, we have deposited the raw sequencing files and mass spectrometry data in GEO and Zenodo, as detailed in the ‘Data Availability’ section.

      In Fig. 5B, the authors excluded the steatosis and fibrosis overlapped genes. Steatosis and fibrosis specific genes could simply reflect the outcomes rather than causes. In this case, the obtained results might not identify the gene signatures related to fibrosis initiation.

      We appreciate this comment, but we do not fully understand the reviewer’s point since we did not exclude overlapped genes in our analysis, and it was unclear to us whether excluding overlapping genes has anything to do with causality of both processes.

      In Figure 5B, we identified the gene signatures associated with steatosis and fibrosis after adjusting for potential confounders such as age, sex, BMI and diabetes status. Our results showed that these signatures were relatively independent, sharing a limited number of genes. We then examined genes uniquely associated with each process by additional adjustment (e.g., adjusting steatosis models for fibrosis grades). To us this was not an unreasonable approach, given that steatosis precedes fibrosis in most cases, especially in morbid obesity.

      We nevertheless agree with the reviewer’s point that the gene expression changes we identified represent statistical associations without warranting causality. To specifically address fibrosis initiation mechanisms within the limitation of the current study design, we performed a separate comparative analysis between patients with fibrosis+steatosis versus those with steatosis alone (Table S11), which still identified GTPase regulation as a potential key mechanism in fibrosis initiation (Figure 6B).

      In Fig. 6D, the authors used 3D liver spheroid to validate their findings. However, there is no images showing the 3D liver spheroid formation before and after PPARa/d agonist treatment. It is not clear whether the 3D liver spheroid was successfully established.

      There is extensive literature (>40 papers) from the Lauschke lab on 3D liver spheroid culture, including but not limited to PMIDs 27143246, 28264975, 32775153, 37870288 and 39605182. Images of the spheroids can be seen in Figure 1c of Adv. Sci. 2024, 2407572 and elafibrinor treatment did not affect the morphology of the spheroids.

      The authors suggested that targeting LX-2 cells with Rac1 and Cdc42 inhibitors could reduce collagen production. Did the authors observe these two genes upregulated in mRNA and protein expression levels in their cohort when compared MASLD patients with and without fibrosis? Did the authors observe that the expression levels of Rac1 and Cdc42 are correlated with fibrosis progression in MASLD patients?

      Regarding comments 7 and 8, we targeted Rac1 and Cdc42 in the LX-2 cell experiment as they are common and major GTPases. Protein-level data are not available in our dataset, but we examined their transcript-level expression. RAC1 and CDC42 expression levels were positively associated with fibrosis progression, with coefficients of 0.362 (q = 0.027) and 0.342 (q = 0.031), respectively. These results are presented in Table S5, and the corresponding boxplots are shown here.

      Figure R1. RAC1 and CDC42 expression levels in individuals with different fibrosis *levels. *

      Other studies have revealed several metabolite changes related to MASLD progression (PMID: 35434590, PMID: 22364559). However, the authors did not discuss the discrepancies between their findings with the previous studies.

      Thank you for the suggestion. We have incorporated a discussion of the two studies into the Discussion section, highlighting the consistencies and discrepancies between our plasma metabolomic results and previous findings. The main differences may stem from variations in MASLD spectrum and the degree of obesity in the cohorts.

      Reviewer #1 (Significance (Required)):

      Overall, the current study might provide some new resources regarding transcriptomic and metabolomic data derived from obese patients with and without MASLD. The MASLD research community will be interested in the resource data.

      We thank this reviewer for the positive and constructive evaluation of our manuscript.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary:

      In this paper, Kaldis and collaborators investigate the molecular heterogeneity of a 109 morbidly obese patient cohort, focusing on liver transcriptomics and metabolomics analysis from liver and serum. The main finding (i.e. upregulation of GTPase-coding genes) was validated in spheroids and a human HSC cell line. As these proteins are involved in critical cellular functions related to metabolism and cytoskeleton dynamics, these findings shed light on their involvement in human liver pathology which so far has been poorly (or even not) documented to date. This is an interesting addition to the current knowledge about chronic liver pathology. However the manuscript suffers from the lack of a clear-cut definition of patient subgroups and the seemingly indistinct use of generic (MASLD, NAS score) and more granular terms (MASH, fibrosis) across the various analysis they performed.

      We thank this reviewer of highlighting the novelty of our manuscript. We agree that mixing generic and granular terms can be confusing and we tried to use of terms consistently throughout, which has been further improved in the revised version.

      Figure 1 and Table 1 provide comprehensive information regarding histological phenotypes, NAS scores, and patient characteristics. From Figure 2 onward, we specifically focused on steatosis and fibrosis as distinct histological features, identifying molecular signatures associated with each process.

      The term ‘MASH’ was used only when referring to the ex vivo 3D spheroids derived from histologically confirmed MASH patients for validation purposes. As our primary cohort represents early disease stages, we did not characterize molecular features of MASH in that data set.

      In this cohort, the term 'NAS' was mentioned only in Section 1 to characterize the disease spectrum. Additionally, in Figures 3A and 6A, we illustrated the association between gene expression levels and NAS in two external cohorts. This was due to the absence of steatosis grades in the two datasets. NAS is an additive measure of multiple scores (steatosis, inflammation and ballooning), but does not account for fibrosis grades.

      Our study focuses on the molecular features of steatosis grades and fibrosis grades as the main histological processes, with all terminology aligned with this stated objective. This allows us to map the transcriptome and metabolome to pathologist-defined steatosis/fibrosis severity (i.e., 0,1,2,3) and identify genes/metabolites that are correlated with increasing steatosis/fibrosis score.

      Major comments:

      • Are the key conclusions convincing?

      The conclusions are generally consistent with findings from numerous previous studies, as many of the genes identified and their associations with disease states have been previously reported. However, I found it difficult to discern which specific disease stages the authors are referring to throughout the manuscript. Terms such as MASLD (Fig. 1F), steatosis (Fig. 4A), MASH, fibrosis (Fig. 6), and the composite NAS score (Fig. 1G) are used interchangeably, without clearly explaining whether or how the patient cohort was stratified to distinguish between isolated steatosis, MASH, and MASH with or without fibrosis. It is also unclear whether subgroups were propensity score-matched.

      As explained in our previous point, we believe that we did not carelessly use the terms interchangeably, but rather used them as they were available or pertinent to the comparisons in discussion. We have provided a comprehensive cohort description in the first section (Table 1, including all histological features and NAS scores), then focused specifically on steatosis and fibrosis in subsequent analyses. We identified distinct molecular processes underlying these two histological features and validated key fibrosis-related pathways.

      Regarding the comment of ‘propensity score-matched subgroups’, we would like to clarify that the only “sub”-group analysis performed in this paper is the transition from steatosis to steatosis with fibrosis. We have consistently used linear regression as the association analysis framework, without binarization of outcomes. We recall that this is a cross-sectional study with challenging recruitment situation from a bariatric surgery clinic that naturally represents the spectrum of MASLD in obesity. We acknowledge that the sampling can always be biased in such a study. However, given the invasiveness of liver resection, the study is also limited by the reality that not all patients would agree to the study, nor it is feasible to form a perfect subgroup meeting 1:1 ratio as in large-scale epidemiology studies based on plasma samples.

      In a related point, the authors mention that 76% of patients are non-fibrotic, introducing a marked imbalance between fibrotic (n=26) and non-fibrotic (n=83) samples. Given this disparity and potential inter-individual variability, it would be helpful to include observed fold changes or effect sizes to give readers a sense of the magnitude of the biological dysregulations being reported.

      As explained in our previous response, our study design examines associations between histological and molecular features rather than using a case-control approach. For effect size quantification, we report standardized linear regression coefficients, i.e. the change in gene expression Z-score per one-point increase in steatosis or fibrosis grade. We also provided fold changes in our comparative analysis of steatosis+fibrosis versus fibrosis-free steatosis. These effect sizes were fully documented in the Supplemental Tables.

      • Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      • The authors seem pretty enthusiastic about elafibranor, despite a failed phase 3 clinical trial. I would qualify elafibranor as a useful tool in preclinical model. We agree with the reviewer and indeed used elafibranor as a research tool for PPARa/d modulation rather than a clinically promising prospect. Discussion regarding elafibranor has been updated.

      • The authors should make clearly the pronounced sex bias in their study, which includes mostly women (and btw refer to sex and not gender in the manuscript). Thank you for this important point. We added "Notably, the cohort was predominantly female (76.1%)" to the 'Overview of the study' section in the manuscript. We also replaced all 'gender' with 'sex' throughout the manuscript. In this cohorts, individuals with previous gender reassignment were excluded (see Materials and Methods).

      • The "MASH" status of the spheroid model is overstated. As described in the text it is much closer to a lipotoxicity model (and even glucotoxicity as Glc concentration is 2g/L). The 3D cultures were established from cells isolated from patients with histologically confirmed MASH. Besides steatosis, we observe increased secretion of pro-inflammatory cytokines, activation of hepatic stellate cells and increased deposition of collagen, thus phenocopying the critical disease hallmarks. Additionally, unbiased omics profiling (transcriptomics, proteomics and lipidomics) reveals significant increases in collagen biosynthesis, inflammatory signaling and cholesterol biosynthesis in MASH patient-derived cultures compared to controls. These differences largely overlapped with the results from analyses of six MASH case-control cohort studies. All of these results have been published previously (PMID 39605182).

      This is confusing with panel D in which the authors establish a relationship between fibrotic patients (F2/F3 vs F0/S0, so I guess "no MASLD liver?) and this model. Is the relationship maintained for steatotic-only patients?

      In Figure 6D, we compared GTPase-related gene expression between patients with fibrosis grade 2/3 (n = 26) and those without fibrosis and steatosis (n = 24). Principal component regression resulted in a positive correlation (β = 9.97) between log2 fold changes in 3D spheroids and human fibrosis samples, indicating consistent directional changes in both systems.

      To answer the question from the reviewer, we compared the expression levels of GTPase-related genes in patients with steatosis but no fibrosis (n = 18) to those without fibrosis and steatosis (n = 24), we observed a negative correlation (β = -10.91). This indicates that GTPase-related gene changes in our 3D spheroids do not align with steatosis-related changes in humans.

      Therefore, under the assumption that fibrosis follows steatosis in the majority of the cases of MASLD progression, the result indicates that the alterations in GTPase-related gene expression in the 3D spheroid model specifically is reflective of fibrosis rather than steatosis.

      Figure R2. Comparison of expression level changes in GTPase-related genes between this human cohort and an independent 3D spheroid system: (A) positive correlation with fibrosis grade 2/3 patients versus controls (left), and (B) negative correlation with steatosis-only patients versus controls (right).

      • Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.

      I am not convinced that HSC and LX2 cells express significant levels of PPARα. However, did the authors check for this parameter in their LX2 cell line and assessed whether PPARα/b activation by elafibranor (and/or pemafibrate as it is PPARα selective) alter GTPase expression? Whether negative or positive, this could give a clue about possible intercellular crosstalk in the spheroid model.

      We thank this reviewer to point this out. In response, we analysed the mRNA expression of all PPARs in LX-2 cells with and without Elafibranor treatment, respectively (see Figure R3, same as Figure S8G in the Supplemental Material). We confirmed PPARs are expressed in LX-2 cells at the mRNA level (Figure R3A). Elafibranor does not affect their mRNA levels, which is consistent with previous reports that its primary mechanism is through binding and altering the activity of PPAR proteins, not gene expression (PMID 33326461 and PMID 37627519).

      *Figure R3. Gene signatures in LX-2 cells with and without Elafibranor treatment (n = 3). *

      In addition, we assessed mRNA levels of selected GTPase-related genes in LX-2 cells with and without Elafibranor treatment (Figure R3B). Although statistical power was limited, we observed a consistent trend toward reduced RHOU, DOCK2, and RAC1 expression with Elafibranor. this preliminary signal suggests that Elafibranor may counter the elevated GTPase levels seen in MASH patient spheroids, potentially via crosstalk among hepatic cell types, including HSCs.

      To further investigate intercellular crosstalk in GTPase regulation among hepatic cell types, we evaluated signature GTPase-related genes in LX-2 cells, spheroid co-cultures (hepatocytes, HSCs, Kupffer cells), and hepatocyte monocultures. As shown in Figure R4 (same as Figure S10 in the supplemental material), TGFB1 served as a positive control, exhibiting the most pronounced induction upon TGF-β1 treatment in hepatocytes. Despite varied alterations across the selected GTPase-related genes, TGF-β1 treatment produced a trend toward increased VAV1 and DOCK2 expression in co-culture, hepatocytes, and LX-2 cells, and this was reversed by the TGF-β inhibitor in co-culture and hepatocytes. Other GTPase genes, including RAC1, RAB32, and RHOU, displayed cell type–specific responses to TGF-β1. These observations suggest that the regulation of GTPases is mediated by multiple hepatic cell types, supporting the importance of intercellular crosstalk.

      Figure R4. Expression of GTPase-related genes in spheroid co-culture, hepatocyte monoculture, and LX-2 cells (n = 3). Controls for each gene and experiment were normalized to 1 to enable comparison across treatment groups.

      • Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      The experiment mentioned above is cheap (cell culture, RT-QPCR) and can be performed within a couple of weeks.

      • Are the data and the methods presented in such a way that they can be reproduced? Yes

      • Are the experiments adequately replicated and statistical analysis adequate? There is no indication of group size, number of replicates for in vitro experiments

      Thank you for this suggestion. We have added the sample sizes to all relevant sections: ‘n = 4’ in the figure legends for 3D spheroid experiments and ‘n = 8–10’ for the LX-2 experiments. This information has also been incorporated into the corresponding experimental descriptions in the Methods section.

      **Referees cross-commenting**

      I believe there is a general consensus on this potentially interesting contribution to the field, with three main points: (1) the need for a careful group-by-group comparison that accounts for potential confounders, (2) a more rigorous exploitation/characterization of the spheroid system, and (3) the need to benchmark the authors' findings against the available literature.

      Thank you for summarizing the main points. Our responses are as follows:

      • We adjusted for key confounders (sex, gender, age, BMI, diabetes) in all statistical analysis to minimize potential bias, mostly using linear regression (rather than group-to-group comparison). In response to Reviewer 3, comment 1, we also conducted additional statistical analyses exploring molecular changes in diabetic vs. non-diabetic individuals.
      • We provided detailed characterization of the spheroid model (response to Reviewer 3, comment 3) and we have done additional experiments in LX-2 cells.
      • We benchmarked our findings using external human cohorts, mouse models, and single cell spheroid systems:
      • Compared our liver transcriptomics data with two published liver RNA-seq datasets (EU cohort, PMID 31467298; VA cohort, PMID 33268509) as shown in Figure 1G. In Figures 3A and 6A, we also included sidebars indicating gene alterations in these cohorts, showing consistent trends. Moreover, we examined the expression alterations of GTPase-related genes in these datasets in response to Reviewer 3’s comment 2.
      • Assessed genes linked to fibrosis progression in hepatic stellate cells from a murine liver fibrosis model (PMID 34839349), confirming differential expression of GTPases and their regulators during fibrosis initiation (Figure S9A).
      • Examined GTPase-related genes in an independent single-cell human spheroid system (PMID 37962490). This enabled cell-type-specific information of GTPase regulation in response to TGF-β (Figure S9C). We also expanded the discussion section on both the consistencies and discrepancies between our findings and previously published studies.

      Reviewer #2 (Significance (Required)):

      The authors identified GTPases as players in the progression of MASLD. This is an interesting preliminary report warranting further molecular investigations (in which liver cell types, which GTPase pathway(s) are involved, which functions are controlled through this pathway...)

      • State what audience might be interested in and influenced by the reported findings.

      This paper will have an impact in the hepatology field

      • Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      I have expertise in the analysis of "MASLD" human cohorts and in the molecular biology of chronic liver diseases.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary:

      Metabolic dysfunction associated liver disease (MASLD) describes a spectrum of progressive liver pathologies linked to life style-associated metabolic alterations (such as increased body weight and elevated blood sugar levels), reaching from steatosis over steatohepatitis to fibrosis and finally end stage complications, such as liver failure and hepatocellular carcinoma. Treatment options for MASLD include diet adjustments, weight loss, and the receptor-β (THR-β) agonist resmetirom, but remain limited at this stage, motivating further studies to elucidate molecular disease mechanisms to identify novel therapeutic targets. In their present study, the authors aim to identify early molecular changes in MASLD linked to obesity. To this end, they study a cohort of 109 obese individuals with no or early-stage MASLD combining measurements from two anatomic sides: 1. bulk RNA-sequencing and metabolomics of liver biopsies, and 2. metabolomics from patient blood. Their major finding is that GTPase-related genes are transcriptionally altered in livers of individuals with steatosis with fibrosis compared to steatosis without fibrosis.

      Major comments:

      1. Confounders (such as (pre-)diabetes) The patient table shows significant differences in non-MASLD vs. MASLD individuals, with the latter suffering more often from diabetes or hypertriglyceridemia.

      Rather than just stating corrections, subgroup analyses should be performed (accompanied with designated statistical power analyses) to infer the degree to which these conditions contribute to the observations. I.e., major findings stating MASLD-associated changes should hold true in the subgroup of MASLD patients without diabetes/of female sex and so forth (testing for each of the significant differences between groups).

      Our original statistical analysis employed linear regression to examine associations between molecular variables (genes/metabolites) and histological progression (steatosis and fibrosis), with adjustment for potential confounders including diabetic status, age, sex, and BMI. We specifically focused on these two histological features to elucidate the disturbed molecular processes during their progression. Regression coefficients represent the expected change in abundance levels (in units of standard deviation of the corresponding molecule) per one-unit increase in histological grades.

      To address the reviewer's question, we conducted additional subgroup analyses to determine whether our major findings remain consistent in individuals with and without diabetes. We assessed linear associations between gene signatures and histological features separately in non-diabetic (n = 71) and diabetic individuals (n = 23). Statistical power was estimated by comparing the variance explained by the full regression model (y ~ x + a + b + c) against the reduced model (y ~ a + b + c), converting the incremental for x into Cohen's , and applying pwr.f2.test with the corresponding degrees of freedom and sample size at α = 0.05.

      For both steatosis and fibrosis, the results in the non-diabetic subgroup (n = 71) showed high consistency with findings in our original analysis (n = 94, adjusted for diabetes), indicating that our originally reported gene signatures, after correction for diabetic status, remain valid in non-diabetic individuals.

      In contrast, for diabetic individuals (n = 23), associations between genes and histological features did not closely replicate our original findings. Notably, we observed larger estimate effects for fibrosis-associated genes in diabetic individuals, suggesting a potential interaction between diabetes and fibrosis progression.

      Figure R5. Subgroup analysis of the association between gene expressions and steatosis grades

      Figure R6. Subgroup analysis of the association between gene expressions and fibrosis grades

      On the comment "degree to which these conditions contribute to the observations," our original analysis adjusted for diabetes status to identify molecular signatures independently associated with fibrosis without the confounding of diabetes status. Consequently, the reported gene signatures in the original analysis more closely reflect patterns in the non-diabetes group, as demonstrated in our subgroup analysis plots. We also comment that, unfortunately, we did not adjust for the interaction of fibrosis and diabetes in the original analysis.

      Furthermore, our additional analyses revealed a close relationship between diabetes and liver fibrosis. Consistent with Figure 1C, hepatic fibrosis is significantly correlated with insulin resistance parameters in clinical assays, including blood insulin levels and HOMA2-IR. To explore this association further, we compared gene expression profiles between diabetic MASLD patients (n = 21) and non-diabetic MASLD patients (n = 43). Although few genes reached significance after multiple testing correction, 166 genes showed differential expression (p 0.32) between these groups.

      We identified 55 genes as potential "diabetic markers" that both showed differential expression between diabetic and non-diabetic MASLD patients and were significantly associated with steatosis or fibrosis progression. These genes are predominantly downregulated metabolic genes (e.g., BAAT, G6PC1, SULT2A1, MAT1A), suggesting that diabetes may exacerbate metabolic suppression as fibrosis advances. Given the high prevalence of diabetes in the MASLD population, our analysis supports the hypothesis that diabetes worsens MASLD outcomes, likely through impaired metabolic capability during fibrosis progression.

      Regarding the comment on the "subgroup of female sex," our original analysis also adjusted for sex as a potential confounder. Since our cohort is predominantly female (>76%), the majority of our findings likely holds true in the female sub-population, similar to what we observed in our diabetes subgroup analysis.

      External validation

      Additionally, to back up the major GTPase signature findings, it would be desirable to analyze an external dataset of (pre)diabetes patients (other biased groups) for alternations in these genes. It would be important to know if this signature also shows in non-MASLD diabetic patients vs. healthy patients or is a feature specific to MASLD. Also, could the matched metabolic data be used to validate metabolite alterations that would be expected under GTPase-associated protein dysregulation?

      We appreciate the comments regarding the validation GTPase as a unique MASLD signature by external datasets. As shown in our previous analysis, after adjusting for diabetes status, the gene signatures remained largely preserved in the non-diabetes subgroup. Before we respond further, we also preface that publicly available liver tissue data, with appropriate and full-scale clinical metadata and sufficient sample sizes, are extremely rare. To the best of our knowledge, the public data sets we brought into our paper were the most prominent data of reliable quality.

      In the paper, we benchmarked our RNAseq dataset against two datasets: the VA cohort and EU cohort (Figure 1). Our cohort focused primarily on early MASLD patients with obesity, which aligns more closely with the disease spectrum represented in the VA cohort (Figure 1G). Notably, in the published paper for the VA cohort, Hoang et al. highlighted Rho GTPase signaling as one of the top pathways in the fibrosis PPI network (Figure 1B from publication PMID 31467298).

      We interrogated GTPase-related genes in both the VA and EU cohorts. As shown in Figure R7 (below), GTPase-related genes demonstrated a strong association with fibrosis grades in the VA cohort, as expected. The EU cohort comprises more advanced MASLD cases with higher fibrosis grades, and our re-analysis in this cohort specifically focused on MASH patients (as designated by the authors). In those MASH patients, GTPase-related genes did not show significant positive associations with fibrosis progression. This finding is consistent with our hypothesis that GTPase regulation is triggered more prominent during the early progression of fibrosis rather than at later stages.

      Unfortunately, diabetes status was not available in the GEO repository for the VA cohort. Available liver tissue sequencing datasets with balanced representation of diabetic and nondiabetic patients are rare, especially those derived from obese individuals and reflecting the early-to-middle stages of MASLD. In our own cohort, for instance, only two diabetic patients without MASLD were recruited (Table 1). While we cannot rule out a role for insulin resistance in GTPase regulation, we will plan future experiments using mouse models to examine GTPase-mediated fibrosis under diabetic and nondiabetic conditions.

      Regarding the comment ‘validate metabolite alterations that would be expected under GTPase-associated protein dysregulation,’ we note that GTPases are primarily involved in cytoskeletal organization, vesicle trafficking, and other cellular processes, with few well-established links to specific metabolite signatures. Nevertheless, in our partial correlation network integrating hepatic genes and metabolites, we observed co-regulated metabolites associated with GTPase-related genes (Figure R8). These included palmitoleoyl ethanolamide (N-acylethanolamine, an anti-inflammatory metabolite and PPARα ligand), phenylacetic acid (a phenylalanine metabolite), biotin (a coenzyme), arginine, lysine, melatonin (a tryptophan metabolite), and several lipid species such as PC 32:0 and CAR 20:1. While causal relationships cannot be inferred from this dataset, our integrative network highlights potential connections related to the trafficking of these metabolites that warrant further investigation.

      Figure R7. Associations between GTPase-related genes to fibrosis in this study and two external cohorts. Asterisks denote significant associations with q value Figure R8. Integrative subnetwork of GTPase-related genes. Blue squares represent GTPase-related genes, red circles indicate metabolites connected to these genes, and the purple diamond denotes fibrosis, which is connected to RHOU.*

      3D liver spheroid MASH model, Fig. 6D/E

      This 3D experiment is technically not an external validation of GTPase-related genes being involved in MASLD, since patient-derived cells may only retain changes that have happened in vivo. To demonstrate that the GTPase expression signature is specifically invoked by fibrosis the LX-2 set up is more convincing, however, the up-regulation of the GTPase-related genes upon fibrosis induction with TGF-beta, in concordance with the patient data, needs to be shown first (qPCR or RNA-seq).

      We agree with the reviewer that experiments in LX-2 (HSC) cells are important and as we have described under ‘Reviewer #2’ we have done this (Figure R3 and Figure R4). Because HSCs only comprise a minor cell population of liver cells, the signals observed in patient bulk RNA data are likely driven primarily by hepatocytes. Nevertheless, we have highlighted the importance of hepatic cell crosstalk in Figure R4 and in our response to Reviewer #2. Additionally, in Supplementary Figure S9B, we identify the potential cell types of origin for the GTPase signals (predominantly hepatocytes and HSCs) using a single-cell dataset from an independent study (PMID: 37962490).

      Additionally, the description of the 3D model is too uncritical. The maintenance of functional human PHHs in 3D has only become available this year (PMID: 40240606) marking a break-through in the field. Since the authors did not use this system, I would strongly assume their findings are largely attributable to the mesenchymal cells in the 3D culture, and these limitations need to be stated.

      We humbly disagree with the reviewer on the 3D liver spheroids. The paper that the reviewer is referencing is related to the proliferation of hepatocytes in organoids, not – at least not directly – their functional maintenance. Here, we use a spheroid model of mature fully differentiated cells, which is conceptually different from the organoid approach. Maintenance of such functional human hepatocytes for multiple weeks in culture has been possible for close to a decade (PMID 27143246). Moreover, particularly for the modeling of chronic liver disease, such as MASH, it is important to use directly patient-derived cells as short induction cycles (typically 1-2 weeks) of disease phenotypes in organoid models do not faithfully reproduce the molecular signatures that stem from chronic exposures in vivo.

      The 3D liver spheroid model we used here is derived from livers from patients with a histologically confirmed diagnosis of MASH. The isolated cells are fully mature and thus do not require in vitro differentiation. There are no MSCs in the 3D cultures; rather the spheroids contain hepatocytes, stellate cells, Kupffer cells as well as various other immune cell types present in the liver at the time of isolation (T cells, B cells, NK cells). Furthermore, the model is extensively characterized at the transcriptomic, proteomic and lipidomic level (PMID 39605182).

      Novelty / references

      Similar studies that also combined liver and blood lipidomics/metabolomics in obese individuals with and without MASLD (e.g. PMID 39731853, 39653777) should be cited. Additionally, it would benefit the quality of the discussion to state how findings in this study add new insights over previous studies, if their findings/insights differ, and if so, why.

      Thank you for the suggestion. We added the two papers into the discussion section. Specifically, we discussed the consistent findings (such as AKR1B10 in PMID 37037945 and mitochondrial dysfunction in PMID 39731853) and discrepancies (such as limited plasma metabolomic changes and circulating sphingolipid alterations in multiple human and mouse models) in comparison with previously published omics studies in MASLD patients. Also, we thoroughly discussed our findings (e.g., lipid dysregulation, dysregulated tryptophan metabolism, GTPase regulation) and potential mechanisms with extensive literature supports from of human, animal, and cell studies.

      Minor comments:

      1. The quality of Supplementary Figures (e.g. S7) makes is impossible to read the labels Thank you for this feedback. The resolution of the figures was impaired in the initial upload. We will provide all supplementary figures with high resolution in our revised submission and ensure all labels are clearly readable.

      For Figure S7C, we presented the correlation matrix of more than 200 GTPase-related genes along with the TGF-β genes TGFB1 and TGFB3. This illustrates the overall co-expression patterns of GTPase-related genes rather than displaying individual gene labels, with arrows now included to highlight TGFB1 and TGFB3.

      Reviewer #3 (Significance (Required)):

      The authors provide an overall sound study on the hepatic transcriptomic and metabolomic signatures in an Australian cohort of 109 obese non-to-early stage MASLD patients. They perform thorough analyses of metabolome and transcriptome in liver biopsies and metabolome in blood, using standard technologies such as RNA sequencing and mass spectrometry. Their key finding is a GTPase-associated gene signature related to fibrosis onset. Limitations of the study include potential cohort confounders (raising the need for expanded control experiments), limited discussion of similar studies, and limits in cell-type resolution, the latter of which is related to the molecular read out, and has in parts been started to be addressed by in vitro experiments in an immortalized HSC lines. Taken together, given additional control analyses will be performed, the results could be of interest to an expert community in the field of molecular hepatology and, while still descriptive, hold the potential to prompt mechanistic follow-up studies.

      We thank this reviewer for a balanced, positive, and constructive evaluation of our manuscript.

    1. Author response:

      Reviewer #1 (Public review):

      It appears obvious that with no or a little fitness penalty, it becomes beneficial to have MHC-coding genes specific to each pathogen. A more thorough study that takes into account a realistic (most probably non-linear in gene number) fitness penalty, various numbers of pathogens that could grossly exceed the self-consistent fitness limit on the number of MHC genes, etc, could be more informative.

      The reviewer seems to be referring to the cost of excessively high presentation breadth.  Such a cost is irrelevant to the inferior fitness of a polymorphic population with heterozygote advantage compared to a monomorphic population with merely doubled gene copy number.  It is relevant to the possibility of a fitness valley separating these two states, but this issue is addressed explicitly in the manuscript.

      An addition or removal of one of the pathogens is reported to affect "the maximum condition", a key ecological characteristic of the model, by an enormous factor 10^43, naturally breaking down all the estimates and conclusions made in [RS]. This observation is not substantiated by any formulas, recipes for how to compute this number numerically, or other details, and is presented just as a self-standing number in the text.

      It is encouraging that the reviewer agrees that this observation, if correct, would cast doubt on the conclusions of Siljestam and Rueffler.  I would add that it is not the enormity of this factor per se that invalidates those conclusions, but the fact that the automatic compensatory adjustment of c<sub>max</sub> conceals the true effects of removing a pathogen, which are quite large.

      I am not sure why the reviewer doubts that this observation is correct.  The factor of 2.7∙10<sup>43</sup> was determined in a straightforward manner in the course of simulating the symmetric Gaussian model of Siljestam and Rueffler with the specified parameter values.  A simple way to determine this number is to have the simulation code print the value to which c<sub>max</sub>  is set, or would be set, by the procedure of Siljestam and Rueffler for different parameter values.  In another section of this response I will describe how to do this with the simulation code written and used by Siljestam and Rueffler; doing so confirms the value that I obtained with my own code.  Furthermore, I will now give a theoretical derivation of this factor.

      As specified by Siljestam and Rueffler, the positions of the m pathogens in (m-1)-dimensional antigenic space correspond to the vertices of a regular simplex centered at the origin, with distance between vertices equal to 1.  The squared distance from the origin to each of the m vertices of such a simplex is (m-1)/2m (https://polytope.miraheze.org/wiki/Simplex).  Thus, the sum of the m squared distances is (m-1)/2.  For the (0, 0) homozygote, condition is multiplied by a factor of exp(-(vr)<sup>2</sup>/2) for each pathogen, where r is the distance from the origin.  It follows that, with v=20, all the pathogens together decrease condition by a factor of exp(20<sup>2</sup>∙(m-1)/4) = exp(100∙(m-1)).  Thus, increasing or decreasing m by 1 changes this value by a factor of exp(100) = 2.7∙10<sup>43</sup>.

      This begs the conclusion that the branching remains robust to changes in c_max that span 4 decades as well.

      That shows only that the results are not extremely sensitive to c<sub>max</sub> or K.  They are, nonetheless, exquisitely sensitive to m and v.  This difference in sensitivities is the reason that a relatively small change to m leads to such a large compensatory change in c<sub>max</sub> a change large enough to have a major effect on the results.

      As I wrote above, there is no explanation behind this number, so I can only guess that such a number is created by the removal or addition of a pathogen that is very far away from the other pathogens. Very far in this context means being separated in the x-space by a much greater distance than 1/\nu, the width of the pathogens' gaussians. Once again, I am not totally sure if this was the case, but if it were, some basic notions of how models are set up were broken. It appears very strange that nothing is said in the manuscript about the spatial distribution of the pathogens, which is crucial to their effects on the condition c.

      I did not explicitly describe the distribution of pathogens in antigenic space because it is exactly the same as in Siljestam and Rueffler, Fig. 4: the vertices of a regular simplex, centered at the origin, with unity edge length.

      The number in question (2.7∙10<sup>43</sup>) pertains to the Gaussian model with v=20.  As specified by Siljestam and Rueffler, each pathogen lies at a distance of 1 from every other pathogen, so the distance of any pathogen from the others is indeed much greater than 1/v.  This condition holds, however, for most of the parameter space explored by Siljestam and Rueffler (their Fig. 4), and for all of the parameter space that seemingly supports their conclusions.  Thus, if this condition indicates that “basic notions of how models are set up were broken”, they must have been broken by Siljestam and Rueffler.

      Overall, I strongly suspect that an unfortunately poor setup of the model reported in the manuscript has led to the conclusions that dispute the much better-substantiated claims made in [SD].

      The reviewer seems to be suggesting that my simulations are somehow flawed and my conclusions unreliable.  I will therefore describe how my conclusions about sensitivity to parameter values can be verified using the simulation code provided by Siljestam and Rueffler themselves, with only small, easily understood modifications.  I will consider adding this description as a supplement when I revise the manuscript.

      The starting point is the Matlab file MHC_sim_Dryad.m, available at https://doi.org/10.5061/dryad.69p8cz98j.  First, we can add a line that prints the value of the variable logcmax, which represents the natural logarithm of cmax determined and used by the code.  Below line 116 (‘prework’), add the line ‘logcmax’ (with no semicolon).

      Now, at the Matlab prompt, execute MHC_sim_Dryad(false, 8, 20, 1) to run the simulation for the Gaussian model with m=8, v=20, and K=1.  The output will indicate that logcmax=700, in accord with the theoretical factor exp(100*(m-1)) derived above.  The allelic diversity, n<sub>e</sub>, will rise to a steady state-level of about 140, as in the red curve of my Fig. 2.

      Now lower m to 7, i.e,  run MHC_sim_Dryad(false, 7, 20, 1).  The output will indicate that logcmax=600.  This confirms that lowering m by 1 causes the code to lower the value of c<sub>max</sub> by a factor exp(100)=2.7∙10<sup>43</sup>, which must also be the factor by which the condition of the most fit homozygote would increase without this adjustment.

      With the change of m to 7 and the compensatory change in c<sub>max</sub>, steady-state allelic diversity remains high.  But what if m changes but c<sub>max</sub> remains the same, as it would in reality?

      To find out, we can fix the value of c<sub>max</sub> to the value used with m=8 by adding the following line below the line previously added: ‘logcmax = 700’.  With this additional modification in place, executing MHC_sim_Dryad(false, 7, 20, 1) confirms that without a compensatory change to c<sub>max</sub>, lowering m from 8 to 7 mostly eliminates allelic diversity, in accord with the corresponding curve in my Fig. 2.  Similarly, raising m from 8 to 9, or changing v from 20 to 19.5 or 20.5 (executing MHC_sim_Dryad(false, 8, 19.5, 1) or MHC_sim_Dryad(false, 8, 20.5, 1)), largely eliminates diversity, confirming the other results in my Fig. 2.  Results for the bitstring model can also be confirmed, though this requires additional changes to the code.

      Thus, the extreme sensitivity of the results of Siljestam and Rueffler to parameter values can be verified with the code that they used for their simulations, indicating that my conclusions are not consequences of my having done a “poor setup of the model”.

      Response to Reviewer #2 (Public review):

      (1) The statement that the model outcome of Siljestam and Rueffler is very sensitive to parameter values is, in this form, not correct. The sensitivity is only visible once a strong assumption by Siljestam and Rueffler is removed. This assumption is questionable, and it is well explained in the manuscript by J. Cherry why it should not be used. This may be seen as a subtle difference, but I think it is important to pin done the exact nature of the problem (see, for example, the abstract, where this is presented in a misleading way).

      I appreciate the distinction, and the importance of clearly specifying the nature of the problem.  However, Siljestam and Rueffler do not invoke the implausible assumption that changes to the number of pathogens or their virulence will be accompanied by compensatory changes to c<sub>max</sub>.  Rather, they describe the adjustment of c<sub>max</sub> (Appendix 7) as a “helpful” standardization that applies “without loss of generality”.  Indeed, my low-diversity results could be obtained, despite such adjustment, by combining the small change to m or v with a very large change to K (e.g., a factor of 2.7∙10<sup>43</sup>).  In this sense there is no loss of generality, but the automatic adjustment of c<sub>max</sub> obscures the extreme sensitivity of the results to m and v.

      (2) The title of the study is very catchy, but it needs to be explained better in the text.

      I had hoped that the final paragraph of the Discussion would make the basis for the title clear.  I will consider whether this can be clarified in a revision.

    1. Duryea-Code Disease Foundation has spent millionstelling the world that people like my father don't exist

      Shows that there are power dynamics within those who have the disease, more severe cases, such as the narrators father, get discarded from activism.

    1. he neces-sary consequence of this was political centralization.Independent or but loosely connected provinces, withseparate interests, laws, governments, and systems oftaxation, became lumped together into one nation,with one government, one code of laws, one nationalclass-interest, one frontier, and one customs tariff

      political centrilization

    Annotators

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This work computationally characterized the threat-reward learning behavior of mice in a  recent study (Akiti et al.), which had prominent individual differences. The authors  constructed a Bayes-adaptive Markov decision process model and fitted the behavioral data  by the model. The model assumed (i) hazard function starting from a prior (with free mean  and SD parameters) and updated in a Bayesian manner through experience (actually no real  threat or reward was given in the experiment), (ii) risk-sensitive evaluation of future  outcomes (calculating lower 𝛼 quantile of outcomes with free 𝛼 parameter), and (iii) heuristic  exploration bonus. The authors found that (i) brave animals had more widespread hazard  priors than timid animals and thereby quickly learned that there was in fact little real threat,  (ii) brave animals may also be less risk-aversive than timid animals in future outcome  evaluation, and (iii) the exploration bonus could explain the observed behavioral features,  including the transition of behavior from the peak to steady-state frequency of bout. Overall,  this work is a novel interesting analysis of threat-reward learning, and provides useful  insights for future experimental and theoretical work. However, there are several issues that I  think need to be addressed.

      Strengths:

      (1) This work provides a normative Bayesian account for individual differences in  braveness/timidity in reward-threat learning behavior, which complements the analysis by  Akiti et al. based on model-free threat reinforcement learning.

      (2) Specifically, the individual differences were characterized by (i) the difference in the  variance of hazard prior and potentially also (ii) the difference in the risk-sensitivity in the  evaluation of future returns.

      Weakness:

      (1) Theoretically the effect of prior is diluted over experience whereas the effect of biased  (risk-aversive) evaluation persists, but these two effects could not be teased apart in the  fitting analysis of the current data.

      (2) It is currently unclear how (whether) the proposed model corresponds to neurobiological ( rather than behavioral) findings, different from the analysis by Akiti et al.

      We thank reviewer #1 for their useful feedback which we’ve used to improve the discussion,  formatting and clarity of the paper, and for highlighting important questions for future  extensions of our work.

      Major points:

      (1) Line 219

      It was assumed that the exploration bonus was replenished at a steady rate when the animal  was at the nest. An alternative way would be assuming that the exploration bonus slowly  degraded over time or experience, and if doing so, there appears to be a possibility that the  transition of the bout rate from peak to steady-state could be at least partially explained by  such a decrease in the exploration bonus.

      Section 2.2.3 explains the mechanism of the exploration bonus which motivates approach.  We think that the mechanism suggested by the reviewer is, in essence, what is happening in  the model. The exploration pool is indeed depleted over time or bouts of experience at the  object. In the peak confident phase for brave animals and the peak cautious phase for timid  animals, the rate of depletion exceeds the rate of regeneration, since the agent spends only  a single turn at the nest between bouts. In the steady-state phase, the exploration pool has  depleted so much previously that the agent must wait multiple turns at the nest for the pool  to regenerate to a sufficiently high value to justify approaching the object again.

      We have updated section 2.2.3 to explain that agents spend one turn at the nest during peak  phase but multiple turns during steady-state phase. Hopefully, this makes our mechanism  clear:

      “In simulations, when 𝐺(𝑡) is high, the agent has a high motivation to explore the object,  spending only a single turn in the nest state between bouts. In other words, the depletion  from 𝐺0 substantially influences the time point at which approach makes a transition from  peak to steady-state; the steady-state time then depends on the dynamics of depletion  (when at the object) and replenishment (when at the nest). In particular, in the steady-state  phases, the agent must wait multiple turns at the nest for 𝐺(𝑡)  to regenerate so that  informational reward once again exceeds the potential cost of hazard.“

      (2) Line 237- (Section 2.2.6, 2.2.7, Figures 7, 9)

      I was confused by the descriptions about nCVaR. I looked at the cited original literature  Gagne & Dayan 2022, and understood that nCVaR is a risk-sensitive version of expected  future returns (equation 4) with parameter α (α-bar) (ranging from 0 to 1) representing risk  preference. Line 269-271 and Section 4.2 of the present manuscript described (in my  understanding) that α was a parameter of the model. Then, isn't it more natural to report  estimated values of α, rather than nCVaR, for individual animals in Section 2.2.6, 2.2.7,  Figures 7, 9 (even though nCVaR monotonically depends on α)? In Figures 7 and 9, nCVaR  appears to be upper-bounded to 1. The upper limit of α is 1 by definition, but I have no idea why nCVaR was also bounded by 1. So I would like to ask the authors to add more detailed  explanations on nCVaR. Currently, CVaR is explained in Lines 237-243, but actually, there is  no explanation about nCVaR rather than its formal name 'nested conditional value at risk' in  Line 237.

      Thank you for pointing out this error. We have corrected the paper to use nCVaR to refer to  the objective and nCVaR's α, or sometimes just α, to refer to the risk sensitivity parameter  and thus the degree of risk sensitivity.

      (3) Line 333 (and Abstract)

      Given that animals' behaviors could be equally well fitted by the model having both nCVaR ( free α) and hazard prior and the alternative model having only hazard prior (with α = 1), may  it be difficult to confidently claim that brave (/timid) animals had risk-neutral (/risk-aversive)  preference in addition to widespread (/low-variance) hazard prior? Then, it might be good to  somewhat weaken the corresponding expression in the Abstract (e.g., add 'potentially also'  to the result for risk sensitivity) or mention the inseparability of risk sensitivity and prior belief  pessimism (e.g., "... although risk sensitivity and prior belief pessimism could not be teased  apart").

      Thank you for this suggestion, we have duly weakened the wording in the Abstract to say  “potentially more risk neutral”:

      “Some animals begin with cautious exploration, and quickly transition to confident approach  to maximize exploration for reward; we classify them as potentially more risk neutral, and  enjoying a flexible hazard prior. By contrast, other animals only ever approach in a cautious  manner and display a form of  self-censoring; they are characterized by potential risk  aversion and high and inflexible hazard priors.”

      Reviewer #2 (Public Review):

      Shen and Dayan build a Bayes adaptive Markov decision process model with three key  components: an adaptive hazard function capturing potential predation, an intrinsic reward  function providing the urge to explore, and a conditional value at risk (CvaR, closely related  to probability distortion explanations of risk traits). The model itself is very interesting and  has many strengths including considering different sources of risk preference in generating  behavior under uncertainty. I think this model will be useful to consider for those studying  approach/avoid behaviors in dynamic contexts.

      The authors argue that the model explains behavior in a very simple and unconstrained  behavioral task in which animals are shown novel objects and retreat from them in various  manners (different body postures and patterns of motor chunks/syllables). The model itself  does capture lots of the key mouse behavioral variability (at least on average on a  mouse-by-mouse basis) which is interesting and potentially useful. However, the variables in  the model - and the internal states it implies the mice have during the behavior - are  relatively unconstrained given the wide range of explanations one can offer for the mouse  behavior in the original study (Akiti et al). This reviewer commends the authors on an original  and innovative expansion of existing models of animal behaviour, but recommends that the  authors  revise their study to reflect the obvious  challenges . I would also recommend a  reduction in claiming that this exercise gives a normative-like or at least quantitative account  of mental disorders.

      We thank reviewer #2 for highlighting some of the strengths of our paper as well as pointing  out important limitations of Akiti et al’s original study which we’ve inherited as well as some  limitations of our own method. We address their concerns below.

      We have added a paragraph to the discussion discussing the limitations of the state  representation we adopted from Akiti’s study.

      (Reviewer #1 had the same concern, see above) “Motivated by tail-behind versus  tail-exposed in Akiti et al. (2022), we model approach using a dichotomy between cautious  and confident approach states [...]”

      We have reduced the suggestion that our model provides an account of mental disorders in  the abstract.

      Before:

      “On the other hand, “timid” animals, characterized by risk aversion and high and inflexible  hazard priors, display self-censoring that leads to the sort of asymptotic maladaptive  behavior that is often associated with psychiatric illnesses such as anxiety and depression.”

      After:

      “By contrast, other animals only ever approach in a cautious manner and display a form of  self-censoring; they are characterized by potential risk aversion and high and inflexible  hazard priors. “

      My main comment is that this paper is a very nice model creation that can characterize the  heterogeneity rodent behavior in a very simple approach/avoid context (Akiti et al; when a  novel object is placed in an arena) that itself can be interpreted in a multitude of ways. The  use of terms like "exploration", "brave", etc in this context is tricky because the task does not  allow the original authors (Akiti et al) to quantify these "internal states" or "traits" with the  appropriate level of quantitative detail to say whether this model is correct or not in capturing  the internal states that result in the rodent behavior. That said, the original behavioral setup  is so simple that one could imagine capturing the behavioral variability in multiple ways ( potentially without evoking complex computations that the original authors never showed  the mouse brain performs). I would recommend reframing the paper as a new model that  proposes a set of internal states that could give rise to the behavioral heterogeneity  observed in Akiti et al, but nonetheless is at this time only a hypothesis. Furthermore, an  explanation of what would be really required to test this would be appreciated to make the  point clearer.

      We thought very hard about using terms that might be considered to be anthropomorphic  such as ‘timid’ and ‘brave’. We are, of course, aware, of the concerns articulated by  investigators such as LeDoux about this. However, we think that, provided that we are clear  on the first appearance (using ‘scare’ quotes) that we are using them as indeed labels for  latent characteristics that capture correlations in various aspects of behaviour, they are more  helpful than harmful in making our descriptions understandable.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript presents computational modelling of the behaviour of mice during  encounters with novel and familiar objects, originally reported by Akiti et al. (Neuron 110, 2022)          . Mice typically perform short bouts of approach followed by a retreat to a safe  distance, presumably to balance exploration to discover possible rewards with the potential  risk of predation. However, there is considerable heterogeneity in this exploratory behaviour,  both across time as an individual subject becomes more confident in approaching the object,  and across subjects; with some mice rapidly becoming confident to closely explore the  object, while other timid mice never become fully confident that the object is safe. The  current work aims to explain both the dynamics of adaptation of individual animals over time,  and the quantitative and qualitative differences in behaviour between subjects, by modelling  their behaviour as arising from model-based planning in a Bayes adaptive Markov Decision  Process (BAMDP) framework, in which the subjects maintain and update probabilistic  estimates of the uncertain hazard presented by the object, and rationally balance the  potential reward from exploring the object with the potential risk of predation it presents.

      In order to fit these complex models to the behaviour the authors necessarily make  substantial simplifying assumptions, including coarse-graining the exploratory behaviour into  phases quantified by a set of summary statistics related to the approach bouts of the animal.  Inter-individual variation between subjects is modelled both by differences in their prior  beliefs about the possible hazard presented by the object and by differences in their risk  preference, modelled using a conditional value at risk (CVaR) objective, which focuses the  subject's evaluation on different quantiles of the expected distribution of outcomes.  Interestingly these two conceptually different possible sources of inter-subject variation in  brave vs timid exploratory behaviour turn out not to be dissociable in the current dataset as  they can largely compensate for each other in their effects on the measured behaviour.  Nonetheless, the modelling captures a wide range of quantitative and qualitative differences  between subjects in the dynamics of how they explore the object, essentially through  differences in how subject's beliefs about the potential risk and reward presented by the  object evolve over the course of exploration, and are combined to drive behaviour.

      Exploration in the face of risk is a ubiquitous feature of the decision-making problem faced  by organisms, with strong clinical relevance, yet remains poorly understood and  under-studied, making this work a timely and welcome addition to the literature.

      Strengths:

      (1) Individual differences in exploratory behaviour are an interesting, important, and  under-studied topic.

      (2) Application of cutting-edge modelling methods to a rich behavioural dataset, successfully  accounting for diverse qualitative and qualitative features of the data in a normative  framework.

      (3) Thoughtful discussion of the results in the context of prior literature.

      Limitations:

      (1) The model-fitting approach used of coarse-graining the behaviour into phases and fitting  to their summary statistics may not be applicable to exploratory behaviours in more complex  environments where coarse-graining is less straightforward.

      (2) Some aspects of the work could be more usefully clarified within the manuscript.

      We thank reviewer #3 for their positive feedback and helping us to improve the clarity of our  paper. We have added discussion they thought was missing.

      Reviewer #1 (Recommendations for the authors):

      (1) Line 25-28

      This part of the Abstract might give an impression that timidity (but not braveness) is  potentially associated with psychiatric illness and even that timidity is thus inferior to  braveness. However, even though extreme timidity might indeed be associated with anxiety  or depression, extreme braveness could also be associated with other psychiatric or  behavioral problems. Moreover, as a population, the existence of both timid and brave  individuals could be advantageous, and it could be a reason why both types of individuals  evolutionarily survived in the case of wild animals (although Akiti et al. used mice, which may  have no or very limited genetic varieties, and so things may be different). So I would like to  encourage the authors to elaborate on the expression of this part of the Abstract and/or  enrich the related discussion in the Discussion.

      This is an important point. We note on line 38 that excessive novelty seeking (potentially  caused by excessive braveness) could also be maladaptive.

      Additionally, we have added a paragraph to the discussion discussing heterogeneity in risk  sensitivity within a population.

      “Our data show that there is substantial variation in the degrees of risk sensitivity across the  mice.  Previous works have reported substantial interpopulation and intrapopulation  differences in risk-sensitivity in humans which depend on gender, age, socioeconomic  status, personality characteristics, wealth and culture (Rieger et al., 2015; Frey et al., 2017).  Despite the normative appeal of 𝛼 = 1, it is possible that a population may benefit from  including individuals with $\alpha$ different from 1.0 or highly negative priors. For example,  more cautious individuals could learn from merely observing the risky behavior of less  cautious individuals. Furthermore, we have only considered risk-sensitivity under epistemic  uncertainty in our work. Risk averse individuals, for instance with 𝛼 < 1 may be more  successful than risk-neutral agents in environments where there are unexpected dangers ( unknown unknowns). Risk-aversion is thus a temperament of ecological and evolutionary  significance (Réale et al., 2007).”

      (2) Line 149

      Section 2.2 consists of eight subsections. I think this organization may not be very  appealing, because there are a bit too many subsections, and their relations are not  immediately clear to readers. So I would like to encourage the authors to make an  elaboration. For example, since 2.2.1 - 2.2.5 describes a summary of model construction  and model fitting whereas 2.2.6-2.2.8 shows the results, it could be good to divide these into  separate sections (2.2.1 - 2.2.5 and 2.3.1 - 2.3.3).

      Thank you for pointing this out. We’ve renumbered the sections as you’ve suggested.

      (3) Line 347-8

      Theoretically, the effect of prior is diluted over experience whereas the effect of biased  (risk-aversive) evaluation persists, as the authors mentioned in Lines 393-394. Then isn't it  possible to consider environments/conditions in which the two effects can be separated?

      We appreciate this suggestion. Indeed, our original thought in modeling this experiment was  that this would be exactly the case here - with epistemic uncertainty reducing as the object  became more familiar. However, proving to an animal that a single environment is  completely stationary/fixed is hard - reflected in our conclusion here that the exploration  bonus pool replenishes. Thus, we argued in the discussion that a series of environments  would be necessary to separate risk sensitivity from priors.

      (4) Line 407

      It would be nice to add a brief phrase explaining how (in what sense) this model's  assumption was consistent with the reported behavior. Also, should the assumption of  having two discrete approach states (cautious and confident) itself be regarded as a  limitation of the model? If the tail-behind and tail-exposure approaches were not merely  operationally categorized but were indicated to be two qualitatively distinct behaviors in the  experiment by Akiti et al., it is reasonable to model them as two discrete states, but  otherwise, the assumption of two discrete states would need to be mentioned as a  simplification/limitation.

      We have now removed line 407, and now have an additional  paragraph in the discussion  discussing the limitations of the tail-behind and tail-exposure state representation: “Motivated by tail-behind versus tail-exposed in Akiti et al. (2022), we model approach using  a dichotomy between cautious and confident approach states. This is likely a crude  approximation to the continuous and multifaceted nature of animal approach behavior. For  example, during approach animals likely adjust their levels of vigilance continuously (or  discretely; Lloyd and Dayan (2018)) to  monitor threat, and choose different velocities for  movement, and different attentional strategies for inspecting the novel object. We hope  future works will model these additional behavioral complexities, perhaps with additional  internal states, and corroborate these states with neurobiological data.”

      (5) Line 418

      The authors contrasted their model-based analyses with the model-free analyses of Akiti et  al. Another aspect of differences between the authors' model and the model of Akiti et al. is  whether it is normative or mechanistic: while how the model of Akiti et al. can be biologically  implemented appears to be clear (TS dopamine represents threat TD error, and TS  dopamine-dependent cortico-striatal plasticity implements TD error-based update of  model-free threat prediction), biological implementation of the authors' model seems more  elusive. Given this, it might be a fruitful direction to explore how these two models can be  integrated in the future.

      We enthusiastically agree that it would be most interesting in the future to explore the  integration of the two models - and, in the discussion ( Lines 537-548, 454-461) , point to  some first steps that might be fruitful along these lines. There are two separate  considerations here: one is that our account is mostly computational and algorithmic,  whereas Akiti’s model is mostly algorithmic and implementational; the second is, as noted by  the reviewer, that our account is model-based, whereas Akiti’s model is model-free (in the  sense of reinforcement learning; RL). These are related - thanks in no small part to the work  from the group including Akiti, we know a lot more about the implementation of model-free  than model-based RL. However, our model-based account does reach additional features of  behavior not captured in Akiti et al.’s model such as bout duration, frequency, and approach  type. Thus, the temptation of unification.

      (6) Line 426

      Related to the previous point, it would be nice to more specifically describe what variable TS  dopamine can represent in the authors' model if possible.

      In the discussion  (Lines 454-461) , we speculate that  TS dopamine could still respond to the  physical salience of the novel object and affect choices by determining the potential cost of  the encountered threat or the prior on the hazard function. For example, perhaps ablating TS  dopamine reduces the hazard priors which leads to faster transition from cautious to  confident approach and longer bout durations, consistent with the optogenetics behavioral  data reported in Akiti et al.

      Reviewer #2 (Recommendations for the authors):

      My guess is simpler versions of the model would not fit the data well. But this does not mean  for example that the mice have probability distortions (CvaR) or that even probabilistic  reasoning and the internal models necessary to support them are acting in the behavioral  context studied by Akiti. So related to the above, I would ask what other models would fit and  would not fit the data? And what does this mean?

      These are good points. Our model provides an approximately normative account of the  animals’ behavior  in terms of what it achieves relative to a utility function. In practice, the  animals could deploy a precompiled model-free policy (which does not rely on probabilistic  computations) that is exactly equivalent to our model-based policy. With the current  experiment, we cannot conclude whether or not the animals are performing the prospective  calculations in an online manner. Of course, the extent to which animals or humans are  performing probabilistic computations online and have internal models are on-going  questions of study.

      Model comparison is difficult because currently we do not know of any other risk-sensitive  exploration models. We cannot directly compare to the model in Akiti et al. since our model  explains additional features of behavior: bout duration, frequency, and approach type.  Indeed, our model is as simple as it can be in the sense with the exception of nCVaR,  removing any of the other parameters makes it difficult to fit some animals in our dataset. In the future, our model could be used to fit other datasets of risk-sensitive exploration and,  ideally,  be compared to other models.

      Explaining why animals avoid the novel object in what the offers call benign environment is a  very tricky issue. In Akiti et al, the readers are not yet convinced that the mice know that this  environment is benign. Being placed in an arena with a novel object presents mice with a  great uncertainty and we do not know whether they treat this as benign. Therefore, the  alternative explanations in this study need to be carefully discussed in lieu of the limitations  of the initial study.

      It is certainly true that it is unclear if the arena is  completely  benign to the animals. However,  the amount of time the animal spends in the center of the arena decreases significantly from  habituation to novelty days. This suggests that the animals avoid the novel object largely  because of the object itself, rather than the potential danger associated with the arena.  Furthermore, the animals are not reported as exhibiting more extreme behaviours such as  freezing. In any case, our account is relative in the sense that we are comparing the time the  animal spends at the object versus elsewhere in the environment, driven by the relative  novelty and relative risk of the environment versus the object. Trying to get more absolute  measures of these quantities would require a richer experimental set-up, for instance with  different degree of habituation or experience of the occurrence of (other) novel objects, in  general.

      We added a short note to the discussion to explain this:

      “Fourth, we modeled the relative amount of time the animal spends at the object versus  elsewhere in the environment which depends on the differential risk in the two states.  However, it is likely the animals avoid the novel object largely because of the object itself,  rather than the potential danger associated with the arena since they spend much less time  at the center of the arena during novelty than habituation days.”

      Figure 2 - how confident are the authors that each mouse differs from y=1? Related to this,  the behavior in Akiti is very noisy and changes across time. I am not sure if the authors fully  describe at what levels their model captures the behavior vs not in a detailed enough  fashion.

      We have performed a random permutation test on the minute-to-minute data. We have  updated Figure 2 so that brave animals that pass the Benjamini–Hochberg procedure y>1 at  level q=0.05 are represented with solid green dots and animals that don’t pass are  represented with hollow dots. 8 out of 11 brave animals passed Benjamini–Hochberg.

      Reviewer #3 (Recommendations for the authors):

      (1) I could not find information in the preprint about code availability. Please consider making  the code public to help others apply these modelling methods.

      We have released code and included the url in the paper in the Methods section.

      (2) Though the manuscript was generally clearly written, there were a number of places  where some additional information or clarification would be useful:

      a) Please define and explain the terms 'tail-behind' and 'tail-exposed' (used to describe  approach bout types) when first used.

      We have added definitions when we first mention these terms:

      “[...] 'tail-behind' (bouts where the animal's nose was closer to the object than the tail for the  entire bout) and 'tail-exposed' (bouts where the animal's tail is closer to the object than the  nose at some point during the bout), associated respectively with cautious risk-assessment  and engagement”

      b) At lines 57-58 when contrasting the 'model-free' account of Akiti et al with the 'model-based' account of the current work, it would be worth clarifying that these terms are  being used in the RL sense rather than e.g. a model-based analysis of the data.  

      We have updated the relevant lines to say “model-free/based reinforcement learning”.

      c) Line 61, the phrase 'the significant long-run approach of timid animals despite having  reached the "avoid" state' is unclear as the 'avoid' state has not been defined.

      We updated the terminology to “avoidance behavior” to be consistent with Akiti et al.  Avoidance refers to the animal routinely avoiding the object and therefore being unable to  learn whether it is safe.

      d) It was not completely clear to me how the coarse-graining of the behaviour was  implemented. Specifically, how were animals assigned to the brave, intermediate, or timid  group, and how were the parameters of the resulting behavioural phases fit?

      Sorry that this was not clear. Section 2.1 explains how the minute-to-minute behavioral data  was coarse-grained and how animal groups were assigned. We have added further  explanation of Figure 2 to the main text:

      “Fig 2 summarizes our categorization of the animals into the three groups: brave,  intermediate, and timid based on the phases identified in the animal's exploratory  trajectories. Timid animals spend no time in confident approach and are plotted in orange at  the origin of Fig 2. Brave animals differ from intermediate animals in that their approach time  during the first ten minutes of the confident phase is greater than the last ten minutes ( steady-state phase). Brave animals are plotted in green above and intermediate animals  are plotted in black below the y=1 line in Fig 2.”

      We also added extra information to outline the goal, and methodology of coarse-graining and  animal grouping:

      “We sought to capture  these qualitative differences (cautious versus confident) as well as  aspects of the quantitative changes in bout durations and frequencies as the animal learns  about their environment. To make this readily possible, we abstracted the data in two ways:

      averaging  bout statistics over time, and clustering the animals into three groups with  operationally distinct behaviors.”

      e) What purpose does the 'retreat' state serve in the BAMDP model (as opposed to  transitioning directly from 'object' to 'nest' states), and why do subjects not pass through it  following 'detect' states?

      Thank you for pointing this out. We have updated Figure 3 to note that the two “detected  states” also point to the “retreat” state. The reviewer is correct that there could be alternative  versions of the state diagram, and the ‘retreat’ state could indeed have been eliminated.  However, we thought that it was helpful to structure the animal’s progress through state  space.

      f) Why was the hazard function parameterised via the mean and SD at each time step rather  than with a parametric form of the mean and SD as a function of time?

      Since the agent can only spend 2, 3, or 4 turns at the object states, we didn’t see a need to  parameterize the mean and SD as a function of time. Doing so is a good solution to scaling  up the hazard function to more time-steps.

      (3) There were also a couple of points that could potentially be usefully touched on in the  discussion:

      a) What, if any, is the relationship between the CVaR objective and distributional RL? They  seem potentially related due to both focussing on quantiles of the outcome distribution.

      We have added a paragraph to the discussion discussing the connection between  distributional RL and CVaR:

      “CVaR is known to come in different flavors in the case of temporally-extended behavior.  Gagne and Dayan (2021) introduces two alternative time-consistent formulations of CVaR:  nested CVaR (nCVaR) and precommitted CVaR (pCVaR). nCVaR and pCVaR both enjoy  Bellman equations which make it possible to compute approximately optimal policies without  directly computing whole distributions of the outcomes. We use nCVaR in this study for its  computational efficiency. There is, of course, great current interest in distributional  reinforcement learning (Bellemare et al., 2023b) which does acquire such whole  distributions, not the least because of prominent observations linking non-linearities in the  response functions of dopamine neurons to methods for learning distributions of outcomes ( Dabney et al., 2020; Masset et al., 2023; Sousa et al., 2023). One functional motivation for  considering entire outcome distributions is the possibility of using them to determine  risk-sensitive policies (Gagne and Dayan, 2021).

      While it is possible to compute CVaR directly from return distributions, Gagne and Dayan  (2021) showed that this can lead to temporally inconsistent policies where the agent  deviates from its original plans (the authors called this the fixed CVaR or fCVaR measure).

      Rather further removed from our model-based methods is work from Antonov and Dayan  (2023), who consider a model-free exploration strategy which exploits full return distributions  to compute the value of perfect information which is used as a heuristic for trying actions  with uncertain consequences. Future works can examine risk-sensitive versions of Antonov  and Dayan (2023)'s computationally efficient model-free algorithm as one solution to the  burdensome computations in our model-based method.”

      b) Why normatively might subjects have non-neutral risk preference as captured by the  CvaR?

      We also added a paragraph to the discussion discussing the advantage of heterogeneity in  risk sensitivity within a population:

      (Reviewer #1 had the same question, see above) “Our data show that there is substantial  variation in the degrees of risk sensitivity across the mice.  Previous works have reported  substantial interpopulation and intrapopulation differences in risk-sensitivity in humans which  depend on gender, age, socioeconomic status, personality characteristics, wealth and culture [...]”

      c) Relevance of the current modelling work to clinical conditions characterised by  dysregulation of risk assesment (e.g. anxiety or PTSD).

      We’ve added a paragraph to the discussion:

      “Inter-individual differences in risk sensitivity are also of critical importance in psychiatry,  reflected in a panoply of anxiety disorders (Butler and Mathews, 1983; Giorgetta et al., 2012;  Maner et al., 2007; Charpentier et al., 2017), along with worry and rumination (Gagne and  Dayan, 2022). Understanding the spectrum of   extreme priors and extreme values of 𝛼  could have therapeutic implications, adding significance to the search for tasks that can  more cleanly separate them.”

      d) Is it surprising to see differences in risk preference (nCVaR) between the familiar object  and novel object condition, given that risk preference might be conceptualised as a trait  rather than a state variable?

      Thank you for raising this point. You are right that we expected risk sensitivity (nCVaR alpha)  to be the same between FONC and UONC animals on average. It is difficult to know if alpha  is higher for FONC than UONC animals due to the non-identifiability between alpha and  hazard priors. We have added this discussion to the paper:

      “This is surprising if we interpret 𝛼 as a trait that is stable through time. Unfortunately, due to  the non-identifiability between 𝛼 and hazard priors, we cannot verify whether 𝛼 is actually  higher for FONC animals than UONC animals.”

    1. Briefing : Les Violences Éducatives Ordinaires (VEO)

      Résumé

      Ce document de synthèse analyse les Violences Éducatives Ordinaires (VEO) en s'appuyant sur l'expertise de professionnels de l'enfance.

      Il met en lumière le contexte historique, la définition, les impacts neuroscientifiques et les défis sociétaux liés à ces pratiques.

      Les VEO, héritage d'une histoire millénaire de domination patriarcale, englobent non seulement les violences physiques (gifles, fessées) mais aussi des formes psychologiques et verbales (humiliations, chantage, cris) qui sont banalisées et profondément ancrées dans les schémas éducatifs.

      La législation française n'a que très récemment, en 2019, interdit explicitement ces pratiques, marquant une rupture avec un passé où le "droit de correction" était légitimé.

      L'impact des VEO sur l'enfant est désormais documenté par les neurosciences : loin de favoriser l'obéissance, le stress généré active les circuits cérébraux de la peur, inhibant les capacités de raisonnement et de coopération.

      Cela compromet le "méta-besoin" fondamental de sécurité de l'enfant, essentiel à son développement.

      Les parents actuels se trouvent dans une "période de transition éducative" complexe, cherchant à abandonner des modèles transmis sur des générations.

      Il est crucial de distinguer l'éducation sans violence du laxisme : l'enjeu est de poser un cadre clair, prévisible et contenant, tout en instaurant un dialogue basé sur la confiance et le respect.

      Ce document détaille ces concepts et recense les ressources disponibles pour accompagner les familles dans cette transition.

      I. Contexte Historique et Sociétal : De la Domination aux Droits de l'Enfant

      La notion de violences éducatives ordinaires est intrinsèquement liée à une longue histoire de domination et à l'évolution du statut de l'enfant dans la société.

      A. L'Héritage Patriarcal

      Antiquité Romaine : Le concept du pater familias donnait au chef de famille un pouvoir absolu, y compris un droit de vie et de mort sur ses enfants et ses esclaves, afin de maintenir un ordre social fondé sur la domination.

      Code Civil Napoléonien (1804) : Cet héritage a été formalisé dans la loi française, qui a réaffirmé la "puissance paternelle" et le "droit de correction" du père sur ses enfants.

      L'article 375 permettait même au père de faire enfermer sa progéniture au titre de la correction. Bien que datant de plus de deux siècles, ce code constitue encore la base du droit civil actuel.

      XIXe et début du XXe siècle : Le père conservait un pouvoir coercitif majeur, pouvant décider de l'enfermement d'un enfant jugé "rebelle" ou désobéissant dans des "maisons de correction" ou des "colonies pénitentiaires agricoles" (comme celle de Mettray en Indre-et-Loire), qui s'apparentaient davantage à des bagnes qu'à des lieux d'éducation.

      B. L'Émergence Lente des Droits de l'Enfant

      Le XXe siècle a vu une évolution progressive de la perception de l'enfant, qui passe d'un objet de correction à un sujet de droits.

      1935 : Abolition de la "correction paternelle", mettant fin au droit d'enfermement parental.

      1945 : L'ordonnance de 1945, dans le contexte de l'après-guerre, crée les juges pour enfants et pose les fondements d'une justice moderne pour les mineurs, axée sur la protection et l'éducation plutôt que sur la seule coercition.

      1970 : La "puissance paternelle" est définitivement abolie et remplacée par l'autorité parentale, qui instaure des droits et devoirs égaux entre la mère et le père. C'est une étape majeure mais le droit de correction reste toléré dans la pratique.

      En 1982, un juge en cour d'appel pouvait encore statuer que fessées et coups de règle ne constituaient pas une "brutalité excessive" s'ils ne laissaient pas de traces.

      1989 : La Convention Internationale des Droits de l'Enfant (CIDE), ratifiée par la France en 1990, reconnaît enfin l'enfant comme un sujet de droit à part entière, devant être protégé de toute forme de violence.

      C. La Loi de 2019 : Une Reconnaissance Tardive

      Malgré la CIDE, la France a mis près de 30 ans à légiférer spécifiquement sur les VEO.

      2015 : La France est condamnée par le Conseil de l'Europe pour l'absence d'une loi "suffisamment claire" interdisant les châtiments corporels.

      Juillet 2019 : Adoption de la loi, souvent surnommée péjorativement "loi anti-fessée". Cette loi, proposée par la députée Maud Petit, a fait l'objet de nombreuses moqueries et d'une forte résistance, illustrée par l'argument "j'ai pris des claques et je n'en suis pas mort".

      Contenu de la loi : Elle stipule de manière concise que "l'autorité parentale s'exerce sans violence physique ou psychologique".

      L'introduction de la notion de violence psychologique est une avancée fondamentale, car elle reconnaît les impacts invisibles mais profonds de certaines pratiques éducatives.

      II. Définition et Formes des Violences Éducatives Ordinaires (VEO)

      Les VEO sont définies comme des pratiques punitives et coercitives, banalisées et courantes ("ordinaires"), utilisées au nom de l'éducation mais qui n'ont aucune valeur éducative et portent atteinte à la dignité et à l'intégrité de l'enfant.

      Elles se classifient en trois grandes catégories.

      Type de Violence Exemples Concrets Citées Violence Physique Gifles, fessées, tapes sur les mains, tirage de cheveux, pincements, secousses, jet d'objets, destruction de jouets, privation de nourriture, isolement forcé dans une pièce. Violence Psychologique Menaces ("tu vas voir..."), culpabilisation, chantage affectif, éducation par la peur, indifférence (ignorer l'enfant, notamment quand il pleure), créer un climat d'insécurité. Violence Verbale Humiliations, insultes, cris, dévalorisation ("tu es nul", "tu n'y arriveras jamais"), comparaisons (entre frères et sœurs ou avec d'autres enfants), moqueries.

      Ces pratiques sont souvent des réactions automatiques de l'adulte face à un sentiment de débordement ou d'impuissance, et peuvent être la reproduction de schémas éducatifs subis durant sa propre enfance.

      III. L'État des Lieux Actuel et la Perception Sociétale

      Une enquête IFOP réalisée pour la Fondation pour l'enfance en 2024 révèle une évolution contrastée des mentalités depuis la loi de 2019. • Baisse des violences physiques : La loi "anti-fessée" semble avoir eu un impact positif, avec une diminution déclarée du recours aux châtiments corporels.

      Stagnation des violences psychologiques : Les violences psychologiques et verbales peinent à diminuer, voire augmentent pour certaines.

      Cela traduit une difficulté à prendre conscience de la portée de ces actes et à modifier des modèles de communication profondément ancrés.

      Résistance parentale : Une part significative des parents interrogés exprime encore une réticence face à la loi, la percevant comme une ingérence de l'État dans la sphère privée ("de quoi se mêle l'État").

      Cet argument de "l'intimité familiale" a historiquement freiné l'avancée de la législation.

      IV. L'Impact des VEO sur le Développement de l'Enfant

      Les neurosciences permettent de comprendre pourquoi les VEO sont non seulement néfastes, mais aussi contre-productives.

      A. La Réponse Cérébrale à la Peur

      Face à un adulte perçu comme menaçant (cris, gestes brusques), le cerveau de l'enfant active un mécanisme de survie.

      1. Perception d'un danger : L'adulte devient une source de frayeur.

      2. Court-circuit du raisonnement : Le signal de peur est traité directement par les zones archaïques du cerveau (système limbique), qui gèrent les émotions et le danger, en contournant le cortex préfrontal, siège de la réflexion et de l'apprentissage.

      3. Réactions instinctives : Le cerveau déclenche l'une des trois réponses primaires au danger :

      ◦ L'attaque (Fight) : Rare envers un parent.

      ◦ La fuite (Flight) : Évitement.

      ◦ La sidération (Freeze) : L'enfant est "tétanisé", incapable d'agir ou de réagir.

      C'est souvent interprété à tort par le parent comme de la provocation.

      Ce processus s'accompagne d'une surproduction d'hormones de stress comme le cortisol, qui, en excès, est néfaste pour le développement cérébral.

      B. L'Atteinte au Besoin Fondamental de Sécurité

      Le "méta-besoin" de sécurité est le pilier du développement de l'enfant, comme l'a reconnu la loi de protection de l'enfance de mars 2016.

      Ce besoin inclut :

      • Les besoins physiologiques (sommeil, alimentation).

      • La nécessité de relations affectives stables et prévisibles.

      Les VEO créent une insécurité fondamentale : l'enfant perd confiance en son environnement et en les figures qui sont censées le protéger.

      C. Les Risques à Long Terme

      Bien qu'il n'y ait pas de causalité automatique, un enfant exposé de manière répétée aux VEO présente une vulnérabilité accrue à :

      Des difficultés scolaires et d'apprentissage : Un enfant en état d'alerte permanent n'est pas disponible pour apprendre.

      Une faible estime de soi : Les messages dévalorisants sont intériorisés.

      La reproduction des schémas de violence : Il peut devenir lui-même auteur de violences ou se retrouver en situation de victime à l'âge adulte (auto-maltraitance ou maltraitance subie).

      V. Le Défi de la "Transition Éducative" pour les Parents

      Les parents d'aujourd'hui sont à la charnière de deux modèles, ce qui crée une période de "transition éducative".

      Sortir de la reproduction : La plupart des adultes ont été éduqués avec des VEO. En situation de stress, la tendance est de reproduire inconsciemment ces schémas. Prendre conscience de cela est la première étape du changement.

      Réparer la relation : Il n'est jamais trop tard pour revenir sur un incident. Reconnaître son erreur devant l'enfant ("ma réaction était disproportionnée"), lui expliquer le contexte sans se défausser, permet de restaurer le lien de confiance et de lui montrer un modèle de gestion de conflit non-violent.

      Éduquer sans violence n'est pas du laxisme : C'est une confusion fréquente. L'enfant a un besoin essentiel de cadre.

      Ce cadre doit être :

      ◦ **Clair et prévisible** : Les règles sont connues et cohérentes.
      

      Contenant : Il offre une sécurité affective qui permet à l'enfant d'explorer le monde.

      Adapté et évolutif : Il change avec l'âge et les compétences de l'enfant, et peut être discuté, notamment avec un adolescent.

      Sanction vs. Punition : La sanction, si elle est proportionnée à l'acte et à l'âge, peut être éducative si elle n'a pas pour but d'humilier ou de dominer, mais de poser une limite et de permettre une réparation.

      VI. Cas Pratiques et Lignes Directrices

      Obliger à embrasser un proche : Cette pratique est une VEO car elle ne respecte pas le corps et le consentement de l'enfant.

      C'est une occasion manquée d'enseigner le droit de dire "non", une compétence cruciale pour la prévention des abus. Il est préférable de proposer des alternatives ("tu peux dire bonjour d'une autre façon").

      Forcer à manger ou à goûter : Le forçage alimentaire peut générer des troubles du comportement alimentaire.

      L'utilisation de la nourriture comme chantage (notamment le sucre comme récompense) crée une relation malsaine à l'alimentation.

      Le rôle du parent est de proposer une alimentation variée, mais l'enfant doit rester maître de ce qu'il ingère.

      Refus de soins (médicaments, brossage de dents) :

      La santé de l'enfant n'est pas négociable, et l'adulte doit poser un cadre ferme. Cependant, l'approche est essentielle : expliquer l'importance du soin, rester calme et convaincu, et utiliser des stratégies ludiques pour obtenir l'adhésion plutôt que de recourir à la force.

      VII. Ressources et Soutien Disponibles

      Il est essentiel pour les parents de ne pas rester isolés face à leurs difficultés. De nombreuses structures, gratuites et confidentielles, existent pour offrir écoute et accompagnement.

      Protection Maternelle Infantile (PMI) : Pour les parents d'enfants de 0 à 6 ans, dans les Maisons Départementales de la Solidarité.

      Lieux d'Accueil Enfants-Parents (LAEP) : Espaces de rencontre et de jeu pour les parents et enfants de 0 à 3 ans.

      Maison des Adolescents : Lieu dédié aux jeunes et à leurs parents.

      Espace Santé Jeune : Ligne d'écoute et accueil pour les 7-25 ans.

      Espaces Parents : Nouveaux lieux d'accueil, d'écoute et d'activités pour tous les parents.

      Associations spécialisées : Comme Les Établis, qui proposent prévention, écoute et orientation.

      Ressources en ligne : Le site stopveo.fr (Violences Éducatives Ordinaires) offre des articles, vidéos et témoignages.

    1. Reviewer #2 (Public review):

      Summary:

      This paper presents a novel transformer-based neural network model, termed the epistatic transformer, designed to isolate and quantify higher-order epistasis in protein sequence-function relationships. By modifying the multi-head attention architecture, the authors claim they can precisely control the order of specific epistatic interactions captured by the model. The approach is applied to both simulated data and ten diverse experimental deep mutational scanning (DMS) datasets, including full-length proteins. The authors argue that higher-order epistasis, although often modest in global contribution, plays critical roles in extrapolation and capturing distant genotypic effects, especially in multi-peak fitness landscapes.

      Strengths:

      (1) The study tackles a long-standing question in molecular evolution and protein engineering: "how significant are epistatic interactions beyond pairwise effects?" The question is relevant given the growing availability of large-scale DMS datasets and increasing reliance on machine learning in protein design.

      (2) The manuscript includes both simulation and real-data experiments, as well as extrapolation tasks (e.g., predicting distant genotypes, cross-ortholog transfer). These well-rounded evaluations demonstrate robustness and applicability.

      (3) The code is made available for reproducibility.

      Weaknesses:

      (1) The paper mainly compares its transformer models to additive models and occasionally to linear pairwise interaction models. However, other strong baselines exist. For example, the authors should compare baseline methods such as "DANGO: Predicting higher-order genetic interactions". There are many works related to pairwise interaction detection, such as: "Detecting statistical interactions from neural network weights", "shapiq: Shapley interactions for machine learning", and "Error-controlled non-additive interaction discovery in machine learning models".

      (2) While the transformer architecture is cleverly adapted, the claim that it allows for "explicit control" and "interpretability" over interaction order may be overstated. Although the 2^M scaling with MHA layers is shown empirically, the actual biological interactions captured by the attention mechanism remain opaque. A deeper analysis of learned attention maps or embedding similarities (e.g., visualizations, site-specific interaction clusters) could substantiate claims about interpretability.

      (3) The distinction between nonspecific (global) and specific epistasis is central to the modeling framework, yet it remains conceptually underdeveloped. While a sigmoid function is used to model global effects, it's unclear to what extent this functional form suffices. The authors should justify this choice more rigorously or at least acknowledge its limitations and potential implications.

      (4) The manuscript refers to "pairwise", "3-4-way", and ">4-way" interactions without always clearly defining the boundaries of these groupings or how exactly the order is inferred from transformer layer depth. This can be confusing to readers unfamiliar with the architecture or with statistical definitions of interaction order. The authors should clarify terminology consistently. Including a visual mapping or table linking a number of layers to the maximum modeled interaction order could be helpful.

  3. resu-bot-bucket.s3.ca-central-1.amazonaws.com resu-bot-bucket.s3.ca-central-1.amazonaws.com
    1. before the botbroke due to coding errors

      No need to add this! Do you want them to think of you as a programmer who builds unreliable code? Probably not lol

    1. In calling the structure of the chromosome fibres a code-script

      from where does he draw the idea "code-script"? Is it from the developing information theory of the time? Somewhere else?

      There is definitely the idea of a code running in the sense of programming, which was likely not a common conceptualization at the time.


      On p. 22 he uses the phrase "law-code" which is likely the closer meaning of code he's using and not the sense of genetic code as understood much later when DNA and the underlying protein coding sequences were unraveled.

      Morse code may also be a tangential underlying meaning of his sense of "code" as something unknown but potentially revealable.

    1. or example, Aghion et al. 2017 was among the top few for human reviewers, but the LLM overall score put it notably lower relative to others, hence a downward green curve.

      @Valentin I don't think that was the greatest discrepancy -- should we identify some with a greater discrepancy here? (Ideally, we even soft-code it, as this is likely to change as we adjust the prompts, anonymize, etc.)

    2. deed, GPT often noted lack of code or data sharing in papers and penalized for it, whereas some human reviewers may have been more forgiving or did not emphasize open-science practices as strongly (especially if they focused more on content quality). As a result, for many papers the AI’s Open Science score is 5–10 points below the human average.

      This is interesting. The human evaluators may have had low expectations because they don't expect the open code and data to be provided until the paper has been published in a peer-reviewed journal. Here I would agree more with the LLM. "What should be" sense.

    1. This article presents an open method for tracking journal articles under transformative agreements using open metadata. The authors apply their approach to the Dutch context as a case study, including validation against national research information. The well-written paper usefully highlights, in an accessible and easy-to-understand manner, how both researchers and practitioners evaluating this open access licensing model can navigate data gaps. By demonstrating that estimating publications under transformative agreements requires combining multiple data sources, the authors offer practical methodological insights for those interested in this prevalent licensing model but uncertain about data sources and their limitations. They also highlight the progress made in increasing the transparency of transformative agreements, which has often been lacking in previous subscription agreements between libraries and publishers.

      In my view, the key contribution of this study is the validation using Dutch research information. This allows to show that while many articles could be matched, there are shortcomings that do not reflect weaknesses in the open method itself, but rather the current state of the data infrastructure for transparency around transformative agreements and open access. Particularly noteworthy is the finding that there are challenges in using corresponding author information as a proxy to delineate open access funding. While it is already known that open metadata (here OpenAlex) on corresponding authors is not as complete as proprietary databases, the validation also reveals that even when corresponding author data are available, issues can arise, particularly with multiple affiliations. The availability of funding information faces similar limitations.

      This leads me to wonder whether the complex structure of transformative agreements on monitoring should warrant a broader discussion based on the findings of this work. Of course, full disclosure of open access invoicing through a community-owned open data service would help assessment but nevertheless makes comparisons between publishers and countries difficult. Examples that can hardly be controlled by open metadata about publication but would require a thorough analysis of the contracts themselves, have been extensively explored by the authors in their qualitative analysis: Authors can decide whether or not to publish open access, agreements can be capped, not all article types are eligible, time lag between submission and publication, and institutions involved. Funding contexts add to this complexity.

      Given this complexity, I wonder whether a focus on ESAC to disclose articles enabled by transformative agreements, which is a community effort only run by the Max Planck Digital Library, is sufficient. Perhaps the authors can speculate on the role of existing national infrastructures and workflows around subscription-based publishing in libraries (serial cataloguing and license management)? Can they be transformed to increase the transparency and thus the accountability of this licensing model, or are these infrastructure services no longer be needed in favour of international open metadata initiatives that have been set up together with transformative agreements? Another consideration might be the role of discovery services such as Unpaywall/OpenAlex or OpenAIRE. I think the paper provides a very good overview of these different actors from a data perspective, but the case study would benefit from a discussion of how the different actors involved, particularly in the Dutch context, could work together to achieve more streamlined monitoring through the combination of data services and standardised agreements, as much data seems to already exist internally.

      Apart, I have two other considerations:

      I suggest that the results section could benefit from earlier mention of the number and proportion of articles that could not be matched. Although these are effectively summarised in the conclusion (last paragraph, page 12), incorporating this information earlier would improve the presentation of findings. I consider the identification of publications missed by the open method due to limitations in the availability of corresponding author data and funding information to be an essential outcome of this research.

      Regarding methodology, I had some difficulty understanding where the disambiguation of ISSN variants took place. The text indicates that this information was obtained from the JCT ("The data from the Journal Checker tool is exposed through a publicly available API. It used ISSN (more precisely ISSN-L) to identify journals and RoR-IDs to identify institutions",page 6).  However, to my knowledge, ISSN-L retrieval is not supported by the JCT API? Upon examination of the code, it appears that ISSN linking to ISSN-L may have been established using Unpaywall data, while Figure 2 refers to Crossref in this context.

      In summary, I would like to congratulate the authors on this important contribution and recommend that all those concerned with open access business models, and those involved in improving the evidence base for transformative agreements, read this important work and adopt the open method presented.

    1. In "Researchers Are Willing to Trade Their Results for Journal Prestige: Results from a Discrete Choice Experiment", the authors investigate researchers’ publication preferences using a discrete choice experiment in a cross-sectional survey of international health and medical researchers. The study investigates publishing decisions in relation to negotiation of trade-offs amongst various factors like journal impact factor, review helpfulness, formatting requirements, and usefulness for promotion in their decisions on where to publish. The research is timely; as the authors point out, reform of research assessment is currently a very active topic. The design and methods of the study are suitable and robust. The use of focus groups and interviews in developing the attributes for study shows care in the design. The survey instrument itself is generally very well-designed, with important tests of survey fatigue, understanding (dominant choice task) and respondent choice consistency (repeat choice task) included. Respondent performance was good or excellent across all these checks. Analysis methods (pMMNL and latent class analysis) are well-suited to the task. Pre-registration and sharing of data and code show commitment to transparency. Limitations are generally well-described.

      In the below, I give suggestions for clarification/improvement. Except for some clarifications on limitations and one narrower point (reporting of qualitative data analysis methods), my suggestions are only that – the preprint could otherwise stand, as is, as a very robust and interesting piece of scientific work.

      1. Respondents come from a broad range of countries (63), with 47 of those countries represented by fewer than 10 respondents. Institutional cultures of evaluation can differ greatly across nations. And we can expect variability in exposure to the messages of DORA (seen, for example, in level of permeation of DORA as measured by signatories in each country, https://sfdora.org/signers/)..%3B!!NVzLfOphnbDXSw!HdeyeHHei6yWQHFjhN3deSSfp82ur9i9JNOLEVOYZN0BvyslUO2S8DlvjBbautmafJEvlUsxQZbT0JLQX7lO8EcOYtZsJkA%24&data=05%7C02%7Ca.l.brasil.varandas.pinto%40cwts.leidenuniv.nl%7C9f47a111adec49d04bb608dd0614ae94%7Cca2a7f76dbd74ec091086b3d524fb7c8%7C0%7C0%7C638673408085242099%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=by5mhPfSM0MFFG9LE2iiYjdtSs5IhvpuukqVv%2FLak2s%3D&reserved=0 "https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fsfdora.org%2Fsigners%2F).%3B!!NVzLfOphnbDXSw!HdeyeHHei6yWQHFjhN3deSSfp82ur9i9JNOLEVOYZN0BvyslUO2S8DlvjBbautmafJEvlUsxQZbT0JLQX7lO8EcOYtZsJkA%24&data=05%7C02%7Ca.l.brasil.varandas.pinto%40cwts.leidenuniv.nl%7C9f47a111adec49d04bb608dd0614ae94%7Cca2a7f76dbd74ec091086b3d524fb7c8%7C0%7C0%7C638673408085242099%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=by5mhPfSM0MFFG9LE2iiYjdtSs5IhvpuukqVv%2FLak2s%3D&reserved=0") In addition, some contexts may mandate or incentivise publication in some venues using measures including IF, but also requiring journals to be in certain databases like WoS or Scopus, or having preferred journal lists). I would suggest the authors should include in the Sampling section a rationale for taking this international approach, including any potentially confounding factors it may introduce, and then adding the latter also in the limitations.

      2. Reporting of qualitative results: In the introduction and methods, the role of the focus groups and interviews seems to have been just to inform the design of the experiment. But then, results from that qualitative work then appear as direct quotes within the discussion to contextualise or explain results. In this sense though, the qualitative results are being used as new data. Given this, I feel that the methods section should include description of the methods and tools used for qualitative data analysis (currently it does not). But in addition, to my understanding (and this may be a question of disciplinary norms – I’m not a health/medicine researcher), generally new data should not be introduced in the discussion section of a research paper. Rather the discussion is meant to interpret, analyse, and provide context for the results that have already been presented. I personally hence feel that the paper would benefit from the qualitative results being reported separately within the results section.

      3. Impact factors – Discussion section: While there is interesting new information on the relative trade-offs amongst other factors, the most emphasised finding, that impact factors still play a prominent role in publication venue decisions, is hardly surprising. More could perhaps be done to compare how the levels of importance reported here differ with previous results from other disciplines or over time (I know a like-for-like comparison is difficult but other studies have investigated these themes, e.g., https://doi.org/10.1177/01655515209585). In addition, beyond the question of whether impact factors are important, a more interesting question in my view is why they still persist. What are they used for and why are they still such important “driver[s] of researchers’ behaviour”? This was not the authors’ question, and they do provide some contextualisation by quoting their participants, but still I think they could do more to contextualise what is known from the literature on that to draw out the implications here. The attribute label in the methods for IF is “ranking”, but ranking according of what and for what? Not just average per-article citations in a journal over a given time frame. Rather, impact factors are used as a proxy indicators of less-tangible desirable qualities – certainly prestige (as the title of this article suggests), but also quality, trust (as reported by one quoted focus group member “I would never select a journal without an impact factor as I always publish in journals that I know and can trust that are not predatory”, p.6), journal visibility, importance to the field, or improved chances of downstream citations or uptake in news media/policy/industry etc. Picking apart the interactions of these various factors in researchers’ choices to make use of IFs (which is not in all cases bogus or unjustified) could add valuable context. I’d especially recommend engaging at least briefly with more work from Science and Technology Studies - especially Müller and de Rijcke’s excellent Thinking with Indicators study (doi: 10.1093/reseval/rvx023), but also those authors other work, as well as work from Ulrike Felt, Alex Rushforth (esp https://doi.org/10.1007/s11024-015-9274-5), Björn Hammerfelt and others.

      4. Disciplinary coverage: (1) A lot of the STS work I talk about above emphasises epistemic diversity and the ways cultures of indicator use differ across disciplinary traditions. For this reason, I think it should be pointed out in the limitations that this is research in Health/Med only, with questions on generalisability to other fields. (2) Also, although the abstract and body of the article do make clear the disciplinary focus, the title does not. Hence, I believe the title should be slightly amended (e.g., “Health and Medical Researchers Are Willing to Trade …”)

    1. eLife Assessment

      This is a useful tool for code-less analysis of patterns in cell migratory behaviours in vivo using intravital microscopy data and allows correlation with spatial features of the tumour microenvironment. There is a clear need for these tools to make quantitative analysis, comparison and interpretation of complex cell tracking data more accessible and solid evidence is provided of its applicability to tracks generated by both proprietary and open tracking software.

    2. Reviewer #1 (Public review):

      In this work, Rios-Jimenez and Zomer et al have developed a 'zero-code' accessible computational framework (BEHAV3D-Tumour Profiler) designed to facilitate unbiased analysis of Intravital imaging (IVM) data to investigate tumour cell dynamics (via the tool's central 'heterogeneity module' ) and their interactions with the tumour microenvironment (via the 'large-scale phenotyping' and 'small-scale phenotyping' modules). A key strength is that it is designed as an open-source modular Jupyter Notebook with a user-friendly graphical user interface and can be implemented with Google Colab, facilitating efficient, cloud-based computational analysis at no cost. In addition, demo datasets are available on the authors GitHub repository to aid user training and enhance the usability of the developed pipeline.

      To demonstrate the utility of BEHAV3D-TP, they apply the pipeline to timelapse IVM imaging datasets to investigate the in vivo migratory behaviour of fluorescently labelled DMG cells in tumour bearing mice. Using the tool's 'heterogeneity module' they were able to identify distinct single-cell behavioural patterns (based on multiple parameters such as directionality, speed, displacement, distance from tumour edge) which was used to group cells into distinct categories (e.g. retreating, invasive, static, erratic). They next applied the framework's 'large-scale phenotyping' and 'small-scale phenotyping' modules to investigate whether the tumour microenvironment (TME) may influence the distinct migratory behaviours identified. To achieve this, they combine TME visualisation in vivo during IVM (using fluorescent probes to label distinct TME components) or ex vivo after IVM (by large-scale imaging of harvested, immunostained tumours) to correlate different tumour behavioural patterns with the composition of the TME. They conclude that this tool has helped reveal links between TME composition (e.g. degree of vascularisation, presence of tumour-associated macrophages) and the invasiveness and directionality of tumour cells, which would have been challenging to identify when analysing single kinetic parameters in isolation.<br /> While the analysis provides only preliminary evidence in support of the authors conclusions on DMG cell migratory behaviours and their relationship with components of the tumour microenvironment, conclusions are appropriately tempered in the absence of additional experiments and controls.

      The authors also evaluated the BEHAV3D TP heterogeneity module using available IVM datasets of distinct breast cancer cell lines transplanted in vivo, as well as healthy mammary epithelial cells to test its usability in non-tumour contexts where the migratory phenotypes of cells may be more subtle. This generated data is consistent with that produced during the original studies, as well as providing some additional (albeit preliminary) insights above that previously reported. Collectively, this provides some confidence in BEHAV3D TP's ability to uncover complex, multi-parametric cellular behaviours that may be missed using traditional approaches.

      While the tool does not facilitate the extraction of quantitative kinetic cellular parameters (e.g. speed, directionality, persistence and displacement) from intravital images, the authors have developed their tool to facilitate the integration of other data formats generated by open-source Fiji plugins (e.g. TrackMate, MTrackJ, ManualTracking) which will help ensure its accessibility to a broader range of researchers. Overall, this computational framework appears to represent a useful and comparatively user-friendly tool to analyse dynamic multi-parametric data to help identify patterns in cell migratory behaviours, and to assess whether these behaviours might be influenced by neighbouring cells and structures in their microenvironment.

      When combined with other methods, it therefore has the potential to be a valuable addition to a researcher's IVM analysis 'tool-box'.

    1. Reviewer #1 (Public review):

      Summary:

      This paper attempts to measure the complex changes of consciousness in the human brain as a whole. Inspired by the perturbational complexity index (PCI) from classic research, authors introduce simulation PCI (𝑠𝑃𝐶𝐼) of a time series of brain activity as a measure of consciousness. They first use large-scale brain network modeling to explore its relationship with the network coupling and input noise. Then the authors verify the measure with empirical data collected in previous research.

      Strengths:

      The conceptual idea of the work is novel. The authors measure the complexity of brain activity from the perspective of dynamical systems. They provide a comparison of the proposed measure with four other indexes. The text of this paper is very concise, supported by experimental data and theoretical model analysis.

      Comments on revisions:

      The manuscript is in good shape after revision. I would suggest that the author open-source the code and data in this study.

    1. Author response:

      We thank the reviewers for their valuable feedback. We will prepare a revision of the manuscript based on these suggestions and comments. We are sure these revisions will improve the paper.

      The only major point we wish to clarify is that this is the first and only manuscript describing the toolbox; it is not a version update. Although it shares a similar name with its 2015 MATLAB predecessor (Nili et al., PLoS Comput Biol), rsatoolbox was designed from scratch. Also, they have no code or structural overlap beyond implementing some similar methods.

      Developed publicly since 2019, rsatoolbox reflects a decade of research in RSA methodology across multiple labs and incorporates new dissimilarity metrics, RDM comparators, inferential procedures, and visualization methods. Importantly, although we cite several papers describing methods implemented in the toolbox, this is the first manuscript to present the toolbox as a whole, its design principles, and the unified analytical framework it offers.

      We are sorry about the forgotten placeholder and the links not working. The links work for us in the pdf at least and we will certainly fix the placeholder as soon as possible.

    1. See how the gods showered glorious gifts on my father Peleus, from the moment of his birth, wealth and possessions beyond other men, kingship of the Myrmidons, and though but a mortal man, a goddess for a wife. Yet some god brought evil even to him, no crowd of princes, but an only son doomed to an untimely end. He receives no care from me, since I sit here in theland of Troy, far from my own country, bringing harm to you and your children. And you, my aged lord, they say you once were happy, renowned for your wealth and your sons, in all the lands, from the isle of Lesbos, where Macar reigned, through upper Phrygia to the boundless Hellespont. But from the moment that the heavenly gods brought this wretched war upon you, all has turned to battle and slaughter. Endure, let your heart not grieve forever, Sorrowing for your son will achieve nothing, you’ll not bring him back to life, though life will bring you other sorrows.’
      1. The shared sorrow over losing a late loved one between Achilles and Priam is displayed as Priam, and Achilles weep. Achilles then begins to express empathy for Priam, relating his father's situation to that of Priam's. Achilles states, " See how the gods showered glorious gifts on my father Peleus, from the moment of his birth, wealth and possessions beyond other men, kingship of the Myrmidons, and though but a mortal man, a goddess for a wife. Yet some god brought evil even to him, no crowd of princes, but an only son doomed to an untimely end." This quote shows Achilles understanding of the situation Priam is in, displaying the tragic parallel within both of their lives. The shared grief is further displayed as Achilles offers Priam words of encouragement as he reflects on his actions since he's been in Troy, and how they've affected Priam's life. The text states, " He receives no care from me, since I sit here in theland of Troy, far from my own country, bringing harm to you and your children. And you, my aged lord, they say you once were happy, renowned for your wealth and your sons, in all the lands, from the isle of Lesbos, where Macar reigned, through upper Phrygia to the boundless Hellespont. But from the moment that the heavenly gods brought this wretched war upon you, all has turned to battle and slaughter. Endure, let your heart not grieve forever, Sorrowing for your son will achieve nothing, you’ll not bring him back to life, though life will bring you other sorrows.’" Overall these quotes show the inevitable price paid within war, and the results of one loyal to the heroic code. The lives of loved ones are always at stake, in the case of Achilles his life it at stake, and for Priam his loyal son Hector's life was at stake.
    1. Advanced Context Engineering for Agents - Summary

      Overview

      • Source: https://www.youtube.com/watch?v=IS_y40zY-hc
      • Type: Technical Conference Talk
      • Length: ~14 minutes (YC Root Access)
      • Speaker: Dexter Horthy, Founder of Human Layer (YC Fall 24)
      • Key Focus: Advanced context engineering techniques for scaling coding agents in production environments

      Executive Summary

      Dexter Horthy presents a systematic approach to context engineering that transforms AI coding from prototyping to production-ready development. He demonstrates how spec-first development, intentional context management, and structured workflows enable teams to ship complex code in large repositories while maintaining quality and team alignment.

      Key Insights

      • Context as Core Constraint: "LLMs are pure functions. The only thing that improves the quality of your outputs is the quality of what you put in, which is your context window." - Context management is the fundamental lever for agent performance
      • Spec-First Development: "In the future where AI is writing more and more of our code, the specs, the description of what we want from our software is the important thing." - Specifications become the source code equivalent in AI-driven development
      • Hierarchy of Impact: "A bad line of research, a misunderstanding of how the system works and how data flows and where things happen can be thousands of bad lines of code." - Early-stage errors compound exponentially through the development process

      Key Elements (CRITICAL FOR LOOKUP)

      Key Concepts

      • Context Engineering: "Everything that makes agents good is context engineering" - [Core philosophy throughout talk]
      • Intentional Compaction: "Be very intentional with what you commit to the file system and the agents memory" - [08:48 timestamp]
      • Spec-First Development: "We were forced to adopt spec first development because it was the only way for everyone to stay on the same page" - [03:12 timestamp]
      • 40% Context Rule: "Our goal all the time is to keep context utilization under 40%" - [11:00 timestamp]
      • Research-Plan-Implement Workflow: "We have three phases research, plan and implement" - [11:00 timestamp]

      Key Personalities

      • Dexter Horthy: "My name is Dex. I'm the founder of a company called Human Layer" - [Speaker, YC Fall 24]
      • Sean Grove: "Sean Grove, the new code. He talked about how we're all vibe coding wrong" - [Referenced expert on coding practices]
      • Jeff Huntley: "Jeff Huntley works on source AMP... he wrote this thing called Ralph Wigum as a software engineer" - [Context optimization expert]
      • Vibbov: "I do a podcast with another YC founder named Vibbov. He built Bam" - [Collaboration partner, BAML creator]

      Key Tools/Technologies

      • Human Layer: "I'm the founder of a company called Human Layer" - [Dexter's company focused on context engineering]
      • BAML: "He built Bam... has anyone here you used BAML before?" - [Programming language/tool for AI workflows]
      • Sub Agents: "A lot of people saw cloud code sub aents and they jumped in... but they're really about context control" - [Context management technique]
      • MCP Tools: "If you have MCP tools that return big blobs of JSON, that's going to flood your context window" - [Tool integration consideration]

      Key References

      • 12 Factor Agents: "We wrote a weird little manifesto called 12actor agents um principles of reliable LLM applications" - [April 22nd foundational work]
      • Stanford Study: "The Stanford study... they ingested data from 100,000 developers... AI engineering and software leads to a lot of rework" - [Research on AI coding effectiveness]
      • Ralph Wigum Article: "He wrote this thing called Ralph Wigum as a software engineer" - [Context optimization methodology]
      • Open Source Prompts: "This is our research prompt. It's really long. It's open source. You can go find it" - [Available implementation resources]

      Detailed Analysis

      The Problem with Current AI Coding

      • Naive Approach Fails: "The most naive way to use a coding agent, which is to shout back and forth with it until you run out of context or you give up or you cry" - [04:48]
      • Complex Systems Challenge: "Doesn't work in big repos, doesn't work for complex systems" - [02:44]
      • Rework Problem: "AI engineering and software leads to a lot of rework. So even if you get benefits, you're actually throwing half of it away" - [01:45]

      Context Engineering Solutions

      • Intentional Compaction Strategy: "Even if we're on the right track, if we're starting to run out of context, be very intentional with what you commit to the file system and the agents memory" - [05:45]
      • Sub-Agent Context Control: "The parent agent can get right to work without having to have the context burden of all of that reading and searching" - [07:27]
      • Frequent Compaction Workflow: "Building your entire development workflow around context management" - [08:48]

      Three-Phase Implementation

      • Research Phase: "Understand how the system works and all the files that matter and perhaps like where a problem is located" - [11:00]
      • Planning Phase: "Tell me every single change you're going to make. not line by line, but like include the files and the snippets" - [11:12]
      • Implementation Phase: "If the plan is good, I'm never shouting at cloud cloud anymore. And if I'm shouting at cloud, it's because the plan was bad" - [11:59]

      Actionable Takeaways

      1. Implement Spec-First Development: Start with detailed specifications before any code generation
      2. Maintain 40% Context Utilization: Keep context windows under 40% capacity for optimal performance
      3. Use Three-Phase Workflow: Structure all development as Research → Plan → Implement
      4. Review Plans, Not Code: Focus human review on specifications and plans rather than generated code
      5. Implement Intentional Compaction: Regularly compress context with structured progress files

      Technical Details

      • Tools/Technologies: Human Layer, BAML, Sub-agents, MCP tools, Context compaction systems
      • Requirements: ~170,000 token context windows, structured prompt engineering, team workflow transformation
      • Implementation Notes: Open-source prompts available, requires significant team process changes

      Case Study Results

      • BAML Rust Codebase: "We decided to see if we could oneshot a fix to a 300,000 line RS codebase... The PR was so good the CTO did not know I was doing it as a bit and he had merged it" - [11:12]
      • Boundary CEO Session: "For 7 hours we sat down and we shipped 35,000 lines of code... he estimated that was 1 to two weeks of work roughly" - [12:44]
      • Team Productivity: "Our intern Sam... shipped two PRs on his first day. on his eighth day, he shipped like 10 in a day" - [13:30]

      Open-Source Prompts Discovery

      FOUND! The research and planning prompts Dexter mentioned are available in Human Layer's GitHub repository:

      Research Prompt

      • Location: https://github.com/humanlayer/humanlayer/blob/main/.claude/commands/research_codebase.md
      • Purpose: Comprehensive codebase research using parallel sub-agents
      • Key Features:
      • Spawns specialized agents (codebase-locator, codebase-analyzer, thoughts-locator, etc.)
      • Structured research document generation with YAML frontmatter
      • File path and line number references for developer navigation
      • Integration with thoughts directory for historical context

      Planning Prompt

      • Location: https://github.com/humanlayer/humanlayer/blob/main/.claude/commands/create_plan.md
      • Purpose: Interactive implementation plan creation through iterative process
      • Key Features:
      • 5-step process: Context Gathering → Research & Discovery → Plan Structure → Detailed Writing → Sync & Review
      • Automated vs Manual success criteria separation
      • Phase-based implementation with specific file changes and verification steps
      • Integration with specialized research agents

      Implementation Methodology

      These prompts demonstrate the practical application of Dexter's three-phase workflow:

      1. Research Phase: Uses research_codebase.md to understand system architecture
      2. Planning Phase: Uses create_plan.md to create detailed implementation specifications
      3. Implementation Phase: Structured execution with clear success criteria

      References & Follow-up

      • Context Engineering is Paramount: The central thesis is that the quality of an AI agent's output is entirely dependent on the quality of the input context. Improving agents is a matter of improving the context you provide them. > "The only thing that improves the quality of your outputs is the quality of what you put in, which is your context window."

      • Critique of Naive Agent Usage: The speaker criticizes the common practice of iteratively prompting an agent without a structured plan. He likens this to writing code, compiling it, and then throwing away the source code, as the valuable "spec" (the prompts and conversation) is lost. > "the idea of sitting and talking to an agent for two hours and figuring out and exactly specifying what you want to do and then throwing away all the prompts and committing the code is basically equivalent to... you checked in the compiled asset and you threw away the code."

      • The "Spec-First" Workflow: To manage the complexity of large, AI-generated pull requests, the speaker's team adopted a "spec-first" development process. This shifts the focus from reviewing code to reviewing detailed plans and research documents. > "we were forced to adopt spec first development because it was the only way for everyone to stay on the same page."

      • Three-Phase Context Management: The core of their process involves three distinct phases, each designed to create high-quality context for the next:

        1. Research: The agent first explores the codebase to understand the system, identifying relevant files and logic. The output is a research document.
        2. Plan: Based on the research, a detailed implementation plan is created, specifying all intended changes, files to be modified, and testing strategies.
        3. Implement: The agent executes the plan, with the context window kept clean and focused (under 40% utilization) by progressively marking parts of the plan as complete.

          "we have three phases research, plan and implement."

      • The Hierarchy of Leverage: The talk emphasizes that errors in the early stages have a cascading effect. A mistake in the research phase can lead to thousands of lines of incorrect code, making the research and planning documents the most critical artifacts to review. > "a bad line of code is a bad line of code. And a bad part of a plan can be hundreds of bad lines of code. And a bad line of research... can be thousands of bad lines of code."

      • Redefining Code Review: Code review's most important function is maintaining mental alignment within a team. Reviewing concise, well-structured plans is more effective for this than trying to parse thousands of lines of AI-generated code. > "code review is about a lot of things, but the most important part is mental alignment."

      • Proven Results: This methodology has allowed the team to solve complex problems and ship massive amounts of code at an accelerated pace, including successfully fixing a bug in a 300,000-line Rust codebase in a single attempt. > "we did get it merged. The PR was so good the CTO did not know I was doing it as a bit and he had merged it by the time we were recording the episode."

      • Future Outlook: The speaker predicts that the technology of coding agents will become a commodity. The true differentiator for teams will be their ability to adapt their workflows and communication to effectively harness these tools.

        "I kind of maybe think coding agents are going to get a little bit commoditized, but the team and the workflow transformation will be the hard part."

    1. open source dependencies as supply chain risk and attack surface, vs how, here Obsidian mitigates against them: - reimplement small functions directly in your own code - fork modules and maintain as own code base - large libraries include version locked files - strongly limit the 3rd party packages that ship in your code to others

      For those lockfiled dependencies have a process for updates (and for onboarding a new one), and don't quickly update what already works. Use time as a buffer: issues with 3rd party stuff will surface over time.

    1. We create a process with the make-process constructor like this: make-process name "hello" procedure ' display "hello" This creates a process with the name “hello”, which will print the string "hello" once the process is executed. The procedure field holds the Scheme code that does all the work of saying “hello”. We will talk about the procedure field a little later and show how to write code snippets in languages other than Scheme. Often we will want to refer to previously created processes later, for example to combine them in a workflow definition. To do that we need to bind the created processes to variable names. Here we bind the above process to a variable named hello: define hello make-process name "hello" procedure ' display "hello" This is a very common thing to do, so the GWL offers a shorter syntax for not only creating a process but also binding it to a variable. The following example is equivalent to the above definition:

      Again this is better as an Annex. knowing that make-process |> bind to define = process is too low level, for a beginner in the language. When the most common use case is to use process so this should be an annex at the end, or a foonote. For those few that want to look under the hood.

    1. FeatureTP Doc UpdateBenchmark StudyFull TP ReportPrior Year Comparison✓✗✓Database SearchLimitedComprehensiveComprehensiveStatistical AnalysisBasicDetailedCompleteNACE Code Filtering✗✓✓OECD CompliancePartial✓FullExpert Review✗✓✓Industry Expert Consultation✗BasicComprehensiveDocumentationUpdate onlyBenchmark reportComplete setTurnaround Time48 hours48 hours7-10 daysPrice€250€500Custom

      illetve a featureeket is frissíteni, a 48 órás delivery date az jó a csak pénzügyi adatok frissítésére. ha a komplett doksit frissítjük (kieg. szolgáltatás) akkor 1 hét, benchmark study 4 nap, local file és master file komplexitástól függően 2, maximum 3 hét.

    2. TP Doc UpdateUpdate your existing transfer pricing documentation€250per updatePlease ensure all provided data is accurate and double-checked before submissionWhat's included:Compare with prior year companiesBVD/VAT ID analysisMulti-year comparisonFinancial indicator trackingPrevious document reviewProfessional update report⚠️ Data accuracy verification required - please double-check all provided informationTurnaround time:48 hoursFormat:Updated documentationOrder TP Doc UpdateMost PopularBenchmark StudyCustom database search with detailed analysis€500per studyWhat's included:Custom search criteriaNACE code filteringIndependence verificationFinancial metrics analysisGeographic screeningDetailed benchmark reportStatistical analysis includedBasic consultation from industry experts includedTurnaround time:48 hoursFormat:Professional reportOrder Benchmark StudyCustom PricingFull TP ReportComplete transfer pricing analysis with expert reviewCustom QuotePricing based on scope and complexityWhat's included:Excel template providedComprehensive TP analysisOECD-compliant methodologyExpert review includedMaster File preparationLocal File documentationAudit-ready documentationTurnaround time:7-10 daysFormat:Complete documentation setGet Custom Quote

      ezeket is kibővíteni a szolgáltatásos doksi alapján

    1. submit button in the notebook flow

      In the current validated idea, PinchSmall is composed of a UI/UX that helps guiding the context and system definition, as well as the selection of the process and technologies to be included in the problem. Finally, a section for definition of the "frontend" settings, to next navigate into the database of reporting plots/tables and the selection of the type of report. It must generate QMDs that allows the user to write the report. The latest request to IT is the possibility to render a report that, when required, can be modified in a form of "what you write is what you see", never revealing the code-like qmd language to an average user.

      The only part that is not properly implemented is the " system data, results interpretation,etc", which I discussed with Lorenzo Aimone to try to adapt his student work on LLMs, but it seems to require a dedicated task till October 24 demonstration. Please verify with Lorenzo this task.

    1. Professional Benchmark ServicesExpert analysis with 48-hour delivery and amendment window. Payment only after finalization.TP Documentation Update€250Update existing transfer pricing documentationPrior year comparisonBVD/VAT ID analysisMulti-year trackingFinancial indicatorsDocument review48-hour turnaroundMost PopularBenchmark Study€500Comprehensive database search and analysisCustom search criteriaNACE code filteringFinancial metrics analysisIndependence verificationDetailed benchmark report48-hour turnaroundFull TP ReportCustomComplete transfer pricing analysisExcel template providedComprehensive TP analysisOECD-compliant methodologyExpert review includedMaster File preparation7-10 days delivery

      Ugyanaz, mint a hpme page kártyáknál

    Tags

    Annotators

    1. Tabbing for keyboard-only users works well. There's a prominent ‘Skip to content’ button on the first tab. And attention has been placed on code structure and alt text for non-text elements.

      This covers a few really important accessibility needs all at once. The skip button makes life easier for people who navigate with a keyboard, so they don’t have to fight through menus every time. Good code structure and alt text mean screen reader users aren’t left out, since images and layout are described properly. It’s a great example of thinking beyond just mouse users and making sure everyone can actually use the site.

    1. Think about discriminating based on height, weight, or postal code. In the context of the law, these are not ‘protected’ categories, so can one assume that they are OK to use as basis of “discrimination”?

      Indirect descrimination - Height and weight may relate to disability or specific ethnic groups. Using postal codes may amount to systemic discrimination as it may relate to race, ancestry, or place of origin. Tribunals will consider impact, not just intent. Even if a category isn't listed, if the effect of a policy or decision result in unfair treatment of a protected group, it can still be challenged. Increasingly courts recognize intersectionality and implicit bias.

  4. clavis-nxt-user-guide-clavisnxt-erste-dev.apps.okd.dorsum.intra clavis-nxt-user-guide-clavisnxt-erste-dev.apps.okd.dorsum.intra
    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      This research takes a novel theoretical and methodological approach to understanding how people estimate the level of control they have over their environment and how they adjust their actions accordingly. The task is innovative and both it and the findings are well-described (with excellent visuals). They also offer thorough validation for the particular model they develop. The research has the potential to theoretically inform understanding of control across domains, which is a topic of great importance.

      We thank the Reviewer for their favorable appraisal and valuable suggestions, which have helped clarify and strengthen the study’s conclusion. 

      In its revised form, the manuscript addresses most of my previous concerns. The main remaining weakness pertains to the analyses aimed at addressing my suggesting of Bayesian updating as an alternative to the model proposed by the authors. My suggestion was to assume that people perform a form of function approximation to relate resource expenditure to success probability. The authors performed a version of this where people were weighing evidence for a few canonical functions (flat, step, linear), and found that this model underperformed theirs. However, this Bayesian model is quite constrained in its ability to estimate the function relating resources. A more robust test would be to assume a more flexible form of updating that is able to capture a wide range of distributions (e.g., using basis functions, gaussian processes, or nonparametric estimators); see, e.g., work by Griffiths on human function learning). The benefit of testing this type of model is that it would make contact with a known form of inference that individuals engage in across various settings and therefore could offer a more parsimonious and generalizable account of function learning, whereby learning of resource elasticity is a special case. I defer to the authors as to whether they'd like to pursue this direction, but if not I think it's still important that they acknowledge that they are unable to rule out a more general process like this as an alternative to their model. This pertains also to inferences about individual differences, which currently hinge on their preferred model being the most parsimonious.

      We thank the Reviewer for this thoughtful suggestion. We acknowledge that more flexible function learning approaches could provide a stronger test in favor of a more general account. Our Bayesian model implemented a basis function approach where the weights of three archetypal functions (flat, step, linear) are learned from experience Testing models with more flexible basis functions would likely require a task with more than three levels of resource investment (1, 2, or 3 tickets). This would make an interesting direction for future work expanding on our current findings. We now incorporate this suggestion in more detail in our updated manuscript (335-341):

      “Second, future models could enable generalization to levels of resource investment not previously experienced. For example, controllability and its elasticity could be jointly estimated via function approximation that considers control as a function of invested resources. Although our implementation of this model did not fit participants’ choices well (see Methods), other modeling assumptions drawn from human function learning [30] or experimental designs with continuous action spaces may offer a better test of this idea.”

      Reviewer #2 (Public review):

      This research investigates how people might value different factors that contribute to controllability in a creative and thorough way. The authors use computational modeling to try to dissociate "elasticity" from "overall controllability," and find some differential associations with psychopathology. This was a convincing justification for using modeling above and beyond behavioral output and yielded interesting results. Notably, the authors conclude that these findings suggest that biased elasticity could distort agency beliefs via maladaptive resource allocation. Overall, this paper reveals important findings about how people consider components of controllability. The authors have gone to great lengths to revise the manuscript to clarify their definitions of "elastic" and "inelastic" and bolster evidence for their computational model, resulting in an overall strong manuscript that is valuable for elucidating controllability dynamics and preferences. 

      We thank the Reviewer for their constructive feedback throughout the review process, which has substantially strengthened our manuscript and clarified our theoretical framework.

      One minor weakness is that the justification for the analysis technique for the relationships between the model parameters and the psychopathology measures remains lacking given the fact that simple correlational analyses did not reveal any significant associations.

      We note that the existence of bivariate relationships is not a prerequisite for the existence of multivariate relationships. Conditioning the latter on the former, therefore, would risk missing out on important relationships existing in the data. Ultimately, correlations between pairs of variables do not offer a sensitive test for the general hypothesis that there is a relationship between two sets of variables. As an illustration, consider that elasticity bias correlated in our data (r = .17, p<.001) with the difference between SOA (sense of agency) and SDS (self-rating depression). Notably, SOA and SDS were positively correlated (r = .47, p<.001), and neither of them was correlated with elasticity bias (SOA: r=.04 p=.43, SDS: r=-.06, p=.16). It was a dimension that ran between them that mapped onto elasticity bias. This specific finding is incidental and uncorrected for multiple comparisons, hence we do not report it in the manuscript, but it illustrates the kinds of relationships that cannot be accounted for by looking at bivariate relationships alone.  

      Reviewer #3 (Public review):

      A bias in how people infer the amount of control they have over their environment is widely believed to be a key component of several mental illnesses including depression, anxiety, and addiction. Accordingly, this bias has been a major focus in computational models of those disorders. However, all of these models treat control as a unidimensional property, roughly, how strongly outcomes depend on action. This paper proposes---correctly, I think---that the intuitive notion of "control" captures multiple dimensions in the relationship between action and outcome.

      In particular, the authors identify one key dimension: the degree to which outcome depends on how much *effort* we exert, calling this dimension the "elasticity of control". They additionally argue that this dimension (rather than the more holistic notion of controllability) may be specifically impaired in certain types of psychopathology. This idea has the potential to change how we think about several major mental disorders in a substantial way and can additionally help us better understand how healthy people navigate challenging decision-making problems. More concisely, it is a very good idea.

      We thank the Reviewer for their thoughtful engagement with our manuscript. We appreciate their recognition of elasticity as a key dimension of control that has the potential to advance our understanding of psychopathology and healthy decision-making.

      Starting with theory, the authors do not provide a strong formal characterization of the proposed notion of elasticity. There are existing, highly general models of controllability (e.g., Huys & Dayan, 2009; Ligneul, 2021) and the elasticity idea could naturally be embedded within one of these frameworks. The authors gesture at this in the introduction; however, this formalization is not reflected in the implemented model, which is highly task-specific.

      Our formal definition of elasticity, detailed in Supplementary Note 1, naturally extends the reward-based and information-theoretic definitions of controllability by Huys & Dayan (2009) and Ligneul (2021). We now further clarify how the model implements this formalized definition (lines 156-159).

      “Conversely, in the ‘elastic controllability model’, the beta distributions represent a belief about the maximum achievable level of control (𝑎<sub>Control</sub>, 𝑏<sub>Control</sub>) coupled with two elasticity estimates that specify the degree to which successful boarding requires purchasing at least one (𝑎<sub>elastic≥1</sub>, 𝑏<sub>elastic≥1</sub>) or specifically two (𝑎<sub>elastic2</sub>, 𝑏<sub>elastic2</sub>) extra tickets. As such, these elasticity estimates quantify how resource investment affects control. The higher they are, the more controllability estimates can be made more precise by knowing how much resources the agent is willing and able to invest (Supplementary Note 1).”

      Moreover, the authors present elasticity as if it is somehow "outside of" the more general notion of controllability. However, effort and investment are just specific dimensions of action; and resources like money, strength, and skill (the "highly trained birke") are just specific dimensions of state. Accordingly, the notion of elasticity is necessarily implicitly captured by the standard model. Personally, I am compelled by the idea that effort and resource (and therefore elasticity) are particularly important dimensions, ones that people are uniquely tuned to. However, by framing elasticity as a property that is different in kind from controllability (rather than just a dimension of controllability), the authors only make it more difficult to integrate this exciting idea into generalizable models.

      We respectfully disagree that we present elasticity as outside of, or different in kind from, controllability. Throughout the manuscript, we explicitly describe elasticity as a dimension of controllability (e.g., lines 70-72, along many other examples). This is also expressed in our formal definition of elasticity (Supplementary Note 1). 

      The argument that vehicle/destination choice is not trivial because people occasionally didn't choose the instructed location is not compelling to me-if anything, the exclusion rate is unusually low for online studies. The finding that people learn more from non-random outcomes is helpful, but this could easily be cast as standard model-based learning very much like what one measures with the Daw two-step task (nothing specific to control here). Their final argument is the strongest, that to explain behavior the model must assume "a priori that increased effort could enhance control." However, more literally, the necessary assumption is that each attempt increases the probability of success-e.g. you're more likely to get a heads in two flips than one. I suppose you can call that "elasticity inference", but I would call it basic probabilistic reasoning.

      We appreciate the Reviewer’s concerns but feel that some of the more subjective comments might not benefit from further discussion. We only note that controllability and its elasticity are features of environmental structure, so in principle any controllability-related inference is a form of model-based learning. The interesting question is whether people account in their world model for that particular feature of the environment.   

      The authors try to retreat, saying "our research question was whether people can distinguish between elastic and inelastic controllability." I struggle to reconcile this with the claim in the abstract "These findings establish the elasticity of control as a distinct cognitive construct guiding adaptive behavior". That claim is the interesting one, and the one I am evaluating the evidence in light of.

      In real-world contexts, it is often trivial that sometimes further investment enhances control and sometimes it does not. For example, students know that if they prepare more extensively for their exams they will likely be able to achieve better grades, but they also know that there is uncertainty in this regard – their grades could improve significantly, modestly, or in some cases, they might not improve at all, depending on the type of exams their study program administers and the knowledge or skills being tested. Our research question was whether in such contexts people learn from experience the degree to which controllability is elastic to invested resources and adapt their resource investment accordingly. Our findings show that they do. 

      The authors argue for CCA by appeal to the need to "account for the substantial variance that is typically shared among different forms of psychopathology". I agree. A simple correlation would indeed be fairly weak evidence. Strong evidence would show a significant correlation after *controlling for* other factors (e.g. a regression predicting elasticity bias from all subscales simultaneously). CCA effectively does the opposite, asking whether-with the help of all the parameters and all the surveys-one can find any correlation between the two sets of variables. The results are certainly suggestive, but they provide very little statistical evidence that the elasticity parameter is meaningfully related to any particular dimension of psychopathology.

      We agree with the Reviewer on the relationship between elasticity and any particular dimension of psychopathology. The CCA asks a different question, namely, whether there is a relationship between psychopathology traits and task parameters, and whether elasticity bias specifically contributes to this relationship. 

      I am very concerned to see that the authors removed the discussion of this limitation in response to my first review. I quote the original explanation here:

      - In interpreting the present findings, it needs to be noted that we designed our task to be especially sensitive to overestimation of elasticity. We did so by giving participants free 3 tickets at their initial visits to each planet, which meant that upon success with 3 tickets, people who overestimate elasticity were more likely to continue purchasing extra tickets unnecessarily. Following the same logic, had we first had participants experience 1 ticket trips, this could have increased the sensitivity of our task to underestimation of elasticity in elastic environments. Such underestimation could potentially relate to a distinct psychopathological profile that more heavily loads on depressive symptoms. Thus, by altering the initial exposure, future studies could disambiguate the dissociable contributions of overestimating versus underestimating elasticity to different forms of psychopathology.

      The logic of this paragraph makes perfect sense to me. If you assume low elasticity, you will infer that you could catch the train with just one ticket. However, when elasticity is in fact high, you would find that you don't catch the train, leading you to quickly infer high elasticity eliminating the bias. In contrast, if you assume high elasticity, you will continue purchasing three tickets and will never have the opportunity to learn that you could be purchasing only one-the bias remains.

      The authors attempt to argue that this isn't happening using parameter recovery. However, they only report the *correlation* in the parameter, whereas the critical measure is the *bias*. Furthermore, in parameter recovery, the data-generating and data-fitting models are identical-this will yield the best possible recovery results. Although finding no bias in this setting would support the claims, it cannot outweigh the logical argument for the bias that they originally laid out. Finally, parameter recovery should be performed across the full range of plausible parameter values; using fitted parameters (a detail I could only determine by reading the code) yields biased results because the fitted parameters are themselves subject to the bias (if present). That is, if true low elasticity is inferred as high elasticity, then you will not have any examples of low elasticity in the fitted parameters and will not detect the inability to recover them.

      The logic the Reviewer describes breaks down when one considers the dynamics of participants’ resource investment choices. A low elasticity bias in a participant’s prior belief would make them persist for longer in purchasing a single ticket despite failure, as compared to a person without such a bias. Indeed, the ability of the experimental design to demonstrate low elasticity biases is evidenced by the fact that the majority of participants were fitted with a low elasticity bias (μ = .16 ± .14, where .5 is unbiased). 

      Originally, the Reviewer was concerned that elasticity bias was being confounded with a general deficit in learning. The weak inter-parameter correlations in the parameter recovery test resolved this concern, especially given that, as we now noted, the simulated parameter space encompassed both low and high elasticity biases (range=[.02,.76]). Furthermore, regarding the Reviewer's concern about bias in the parameter recovery, we found no such significant bias with respect to the elasticity bias parameter (Δ(Simulated, Recovered)= -.03, p=.25), showing that our experiment could accurately identify low and high elasticity biases.

      The statistical structure of the task is inconsistent with the framing. In the framing, participants can make either one or two second boarding attempts (jumps) by purchasing extra tickets. The additional attempt(s) will thus succeed with probability p for one ticket and 2p – p<sup>^</sup>2 for two tickets; the p<sup>^</sup>2 captures the fact that you only take the second attempt if you fail on the first. A consequence of this is buying more tickets has diminishing returns. In contrast, in the task, participants always jumped twice after purchasing two tickets, and the probability of success with two tickets was exactly double that with one ticket. Thus, if participants are applying an intuitive causal model to the task, they will appear to "underestimate" the elasticity of control. I don't think this seriously jeopardizes the key results, but any follow-up work should ensure that the task's structure is consistent with the intuitive causal model.

      We thank the Reviewer for this comment, and agree the participants may have employed the intuitive understanding the Reviewer describes. This is consistent with our model comparison results, which showed that participants did not assume that control increases linearly with resource investment (lines 677-692). Consequently, this is also not assumed by our model, except perhaps by how the prior is implemented (a property that was supported by model comparison). In the text, we acknowledge that this aspect of the model and participants’ behavior deviates from the true task's structure, and it would be worthwhile to address this deviation in future studies. 

      That said, there is no reason that this will make participants appear to be generally underestimating elasticity. Following exposure to outcomes for one and three tickets, any nonlinear understanding of probabilities would only affect the controllability estimate for two tickets. This would have contrasting effects on the elasticity estimated to the second and third tickets, but on average, it would not change the overall elasticity estimated. On the other hand, such a participant is only exposed to outcomes for two and three tickets, they would come to judge the difference between the first and second tickets too highly, thereby overestimating elasticity.  

      The model is heuristically defined and does not reflect Bayesian updating. For example, it overestimates maximum control by not using losses with less than 3 tickets (intuitively, the inference here depends on what your beliefs about elasticity). Including forced three-ticket trials at the beginning of each round makes this less of an issue; but if you want to remove those trials, you might need to adjust the model. The need to introduce the modified model with kappa is likely another symptom of the heuristic nature of the model updating equations.

      Note that we have tested a fully Bayesian model (lines 676-691), but found that this model fitted participants’ choices worse. 

      You're right; saying these analyses provides "no information" was unfair. I agree that this is a useful way to link model parameters with behavior, and they should remain in the paper. However, my key objection still holds: these analyses do not tell us anything about how *people's* prior assumptions influence behavior. Instead, they tell us about how *fitted model parameters* depend on observed behavior. You can easily avoid this misreading by adding a small parenthetical, e.g.

      Thus, a prior assumption that control is likely available **(operationalized by \gamma_controllability)** was reflected in a futile investment of resources in uncontrollable environments.

      We thank the Reviewer for the suggestion and have added this parenthetical (lines 219, 225).

    1. When social workers act on behalf of clients who lack the capacity to make informed decisions, social workers should take reasonable steps to safeguard the interests and rights of those clients.

      When working at my first job, we had a client who was aging and lacked capacity to make informed decisions. We felt her grandson was financially taking advantage of her. We followed through with this section of the NASW Code by implementing safeguards to ensure she was not being taken advantage of and make sure her rights and interests were being represented and protected

    2. Social workers should be aware that posting personal information on professional Web sites or other media might cause boundary confusion, inappropriate dual relationships, or harm to clients.

      This section of the NASW code of ethics raises important questions regarding power and structural inequality because it is not clear and concise about what is personal information that should not be shared. I would assume that the personal information includes personal email, personal number, home address, etc. I believe this code should go into more detail to deter a violation.

    1. Reviewer #3 (Public review):

      Summary:

      The manuscript by Shaikh and Assisi addresses a timely and important question related to the neural circuit mechanisms underlying spatial representations during navigation. Concretely, they present a model of the medial entorhinal cortex (MEC) with biophysically detailed conductance-based stellate cells that can perform path integration and reveal two potential mechanisms underlying two forms of predictive coding by grid cells in the MEC. One mechanism uses HCN channels to explain predictive coding in MEC layer II grid cells equivalent to ~5% of the diameter of a grid field, and the other uses asymmetric connections between interneurons and stellate cells, resulting in a ~25% predictive bias of layer III grid cells. The methods and model are technically sound, and the model is expected to be useful for computational neuroscientists studying the neural mechanisms of spatial navigation.

      Strengths:

      One strength of the model is its use of conductance-based neuron models of stellate cells and interneurons, adding important biophysical constraints and details to existing continuous attractor network models of grid cells. The model fills a gap in the literature by providing mechanisms for predictive coding constrained by biophysical properties of stellate cells and simplified network topology.

      Weaknesses:

      A weakness of the model is that the neural network is relatively small (five sheets with 71 × 71 neurons each), and the 2-D toroidal topology is further simplified to a 1-D ring attractor consisting of three rings with 192 neurons each. The model incorporates biophysical detail at the single-neuron level, but not at the network level. For example, it includes only stellate cells and a generic interneuron type, and does not implement data-driven connectivity patterns.

      The restricted network size and the limited experimental knowledge about connectivity among stellate cells, principal cells, and different interneuron types in the MEC could be addressed in more detail. Moreover, the manuscript lacks a thorough discussion of assumptions common to most continuous attractor network (CAN) models of grid cells, such as the use of "hand-crafted" connections between direction-sensitive conjunctive grid cells and network cells to drive attractor shifts. Including such a discussion would strengthen the manuscript. This is especially relevant given the authors' explicit claim that they have revealed two mechanisms underlying the emergence of a predictive code in the MEC. In this reviewer's view, the work demonstrates a potential mechanism, but one that requires experimental verification. The significance of the model would thus be increased by providing more experimentally testable predictions of the model.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      (1) The use of single-cell RNA and TCR sequencing is appropriate for addressing potential relationships between gene expression and dual TCR.

      Thank you for your detailed review and suggestions. The main advantages of scRNA+TCR-seq are as follows: (1) It enables comparative analysis of features such as the ratio of single TCR paired T cells to dual TCR paired T cells at the level of a large number of individual T cells, through mRNA expression of the α and β chains. In the past, this analysis was limited to a small number of T cells, requiring isolation of single T cells, PCR amplification of the α and β chains, and Sanger sequencing; (2) While analyzing TCR paired T cell characteristics, it also allows examination of mRNA expression levels of transcription factors in corresponding T cells through scRNA-seq.

      (2) The data confirm the presence of dual TCR Tregs in various tissues, with proportions ranging from 10.1% to 21.4%, aligning with earlier observations in αβ T cells.

      Thank you very much for your detailed review and suggestions. Early studies on dual TCR αβ T cells have been very limited in number, with reported proportions of dual TCR T cells ranging widely from 0.1% to over 30%. In contrast, scRNA+TCR-seq can monitor over 5,000 single and paired TCRs, including dual paired TCRs, in each sample, enabling more precise examination of the overall proportion of dual TCR αβ T cells. It is important to note that our analysis focuses on T cells paired with functional α and β chains, while T cells with non-functional chain pairings and those with a single functional chain without pairing were excluded from the total cell proportion analysis. Previous studies generally lacked the ability to determine expression levels of specific chains in T cells without dual TCR pairings.

      (3) Tissue-specific patterns of TCR gene usage are reported, which could be of interest to researchers studying T cell adaptation, although these were more rigorously analyzed in the original works.

      Thank you very much for your detailed review and suggestions. T cell subpopulations exhibit tissue specificity; thus, we conducted a thorough investigation into Treg cells from different tissue sites. This study builds upon the original by innovatively analyzing the differences in VDJ rearrangement and CDR3 characteristics of dual TCR Treg cells across various tissues. This provides new insights and directions for the potential existence of “new Treg cell subpopulations” in different tissue locations. The results of this analysis suggest the necessity of conducting functional experiments on dual TCR Treg cells at both the TCR protein level and the level of effector functional molecules.

      (4) Lack of Novelty: The primary findings do not substantially advance our understanding of dual TCR expression, as similar results have been reported previously in other contexts.

      Thank you for your detailed review and suggestions. Early research on dual TCR T cells primarily relied on transgenic mouse models and in vitro experiments, using limited TCR alpha chain or TCR beta chain antibody pairings. Flow cytometry was used to analyze a small number of T cells to estimate dual TCR T cell proportion. No studies have yet analyzed dual TCR Treg cell proportion, V(D)J recombination, and CDR3 characteristics at high throughput in physiological conditions. The scRNA+TCR-seq approach offers an opportunity to conduct extensive studies from an mRNA perspective. With high-throughput advantages of single-cell sequencing technology, researchers can analyze transcriptomic and TCR sequence characteristics of all dual TCR Treg cells within a study sample, providing new ideas and technical means for investigating dual TCR T cell proportions, characteristics, and origins under different physiological and pathological states.

      (5) Incomplete Evidence: The claims about tissue-specific differences lack sufficient controls (e.g., comparison with conventional T cells) and functional validation (e.g., cell surface expression of dual TCRs).

      Thank you for your detailed review and suggestions. This study indeed only analyzed dual  TCR Treg cells from different tissue locations based on the original manuscript, without a comparative analysis of other dual TCR T cell subsets corresponding to these tissue locations. The main reason for this is that, in current scRNA+TCR-seq studies of different tissue locations, unless specific T cell subsets are sorted and enriched, the number of T cells obtained from each subset is very low, making a detailed comparative analysis impossible. In the results of the original manuscript, we observed a relatively high proportion of dual TCR Treg cell populations in various tissues, with differences in TCR composition and transcription factor expression. Following the suggestions, we have included additional descriptions in R1, citing the study by Tuovinen et al., which indicates that the proportion of dual TCR Tregs in lymphoid tissues is higher than other T cell types. This will help understand the distribution characteristics of dual TCR Treg cells in different tissues and provide a basis for mRNA expression levels to conduct functional experiments on dual TCR Treg cells in different tissue locations.

      (6) Methodological Weaknesses: The diversity analysis does not account for sample size differences, and the clonal analysis conflates counts and clonotypes, leading to potential misinterpretation.

      We thank you for your review and suggestions. In response to your question about whether the diversity analysis considered the sample size issue, we conducted a detailed review and analysis. This study utilized the inverse Simpson index to evaluate TCR diversity of Treg cells. A preliminary analysis compared the richness and evenness of single TCR Treg cell and dual TCR Treg cell repertoires. The two datasets analyzed were from four mouse samples with consistent processing and sequencing conditions. However, when analyzing single TCR Tregs and dual TCR Tregs from various tissues, differences in detected T cell numbers by sequencing cannot be excluded from the diversity analysis. Following recommendations, we provided additional explanations in R1: CDR3 diversity analysis indicates TCR composition of dual TCR Treg cells exhibits diversity, similar to single TCR Treg cells; however, diversity indices of single TCR Tregs and dual TCR Tregs are not suitable for statistical comparison. Regarding the "clonal analysis" you mentioned, we define clonality based on unique TCR sequences; cells with identical TCR sequences are part of the same clone, with ≥2 counts defined as expansion. For example, in Blood, there are 958 clonal types and 1,228 cells, of which 449 are expansion cells. In R1, we systematically verified and revised clonal expansion cells across all tissue samples according to a unified standard.

      (7) Insufficient Transparency: The sequence analysis pipeline is inadequately described, and the study lacks reproducibility features such as shared code and data.

      Thank you for your review and suggestions. Based on the original manuscript, we have made corresponding detailed additions in R1, providing further elaboration on the analysis process of shared data, screening methods, research codes, and tools. This aims to offer readers a comprehensive understanding of the analytical procedures and results.

      (8) Weak Gene Expression Analysis: No statistical validation is provided for differential gene expression, and the UMAP plots fail to reveal meaningful clustering patterns.

      Thank you very much for your review and suggestions. Based on your recommendations, we conducted an initial differential expression analysis of the top 10 mRNA molecules in single TCR Treg and dual TCR Treg cells using the DESeq2 R package in R1, with statistical significance determined by Padj < 0.05. Regarding the clustering patterns in the UMAP plots, since the analyzed samples consisted of isolated Treg cell subpopulations that highly express immune suppression-related genes, we did not perform a more detailed analysis of subtypes and expression gene differences. This study primarily aims to explore the proportions of single TCR and dual TCR Treg cells from different tissue sources, as well as the characteristics of CDR3 composition, with a focus on showcasing the clustering patterns of samples from different tissue origins and various TCR pairing types.

      (9) A quick online search reveals that the same authors have repeated their approach of reanalysing other scientists' publicly available scRNA-VDJ-seq data in six other publications,In other words, the approach used here seems to be focused on quick re-analyses of publicly available data without further validation and/or exploration.

      Thank you for your review and suggestions. Most current studies utilizing scRNA+TCR-seq overlook analysis of TCR pairing types and related research on single TCR and dual TCR T cell characteristics. Through in-depth analysis of shared scRNA+TCR-seq data from multiple laboratories, we discovered a significant presence of dual TCR T cells in high-throughput T cell research results that cannot be ignored. In this study, we highlight the higher proportion of dual TCR Tregs in different tissue locations, which exhibits a certain degree of tissue specificity, suggesting these cells may participate in complex functional regulation of Tregs. This finding provides new ideas and a foundation for further research into dual TCR Treg functions. However, as reviewers pointed out, findings from scRNA+TCR-seq at the mRNA level require additional functional experiments on dual TCR T cells at the protein level. We have supplemented our discussion in R1 based on these suggestions.

      Reviewer #2 (Public review):

      (1) The existence of dual TCR expression by Tregs has previously been demonstrated in mice and humans (Reference #18 and Tuovinen. 2006. Blood. 108:4063; Schuldt. 2017. J Immunol. 199:33, both omitted from references). The presented results should be considered in the context of these prior important findings.

      Thank you very much for your review and suggestions. Based on the original manuscript, we have supplemented our reading, understanding, and citation of closely related literature (Tuovinen, 2006, Blood, 108:4063 (line 44,line175 in R1); Schuldt, 2017, J Immunol, 199:33 (line 44,line178 in R1)). We once again appreciate the valuable comments from the reviewers, and we will refer to these in our subsequent dual TCR T cell research.

      (2) This demonstration of dual TCR Tregs is notable, though the authors do not compare the frequency of dual TCR co-expression by Tregs with non-Tregs. This limits interpreting the findings in the context of what is known about dual TCR co-expression in T cells.

      Thank you very much for your review and suggestions. This analysis is primarily based on the scRNA+TCR-seq study of sorted Treg cells, where we found the proportions and distinguishing features of dual TCR Treg cells in different tissue sites. Given the diversity and complexity of Treg function, conducting a comparative analysis of the origins of dual TCR Treg cells and non-T cells with dual TCRs will be a meaningful direction. Currently, peripheral induced Treg cells can originate from the conversion of non-Treg cells; however, little is known about the sources and functions of dual TCR Treg cell subsets in both central and peripheral sites. In R1, we have supplemented the discussion regarding the possible origins and potential applications of the "novel dual TCR Treg" subsets.

      (3) Comparison of gene expression by single- and dual TCR Tregs is of interest, but as presented is difficult to interpret. Statistical analyses need to be performed to provide statistical confidence that the observed differences are true.

      Thank you very much for your review and suggestions. Based on your recommendations, we performed an initial differential expression analysis of the top 10 mRNA molecules in single TCR Treg and dual TCR Treg cells using the DESeq2 R package in R1, with a statistical significance threshold of Padj<0.05 for comparisons.

      (4) The interpretations of the gene expression analyses are somewhat simplistic, focusing on the single-gene expression of some genes known to have a function in Tregs. However, the investigators miss an opportunity to examine larger patterns of coordinated gene expression associated with developmental pathways and differential function in Tregs (Yang. 2015. Science. 348:589; Li. 2016. Nat Rev Immunol. Wyss. 2016. 16:220; Nat Immunol. 17:1093; Zenmour. 2018. Nat Immunol. 19:291).

      Thank you for your review and suggestions. This study is based on publicly available scRNA+TCR-seq data from different organ sites generated by the original authors, focusing on sorted and enriched Treg cells within each tissue sample. However, there was no corresponding research on other cell types in each tissue sample, preventing analysis of other cells and factors involved in development and differentiation of single TCR Treg and dual TCR Treg. The literature suggested by the reviewer indicates that development, differentiation, and function of Treg cells have been extensively studied, resulting in significant advances. It also highlights complexity and diversity of Treg origins and functions. This research aims to investigate "novel dual TCR Treg cell subpopulations" that may exhibit tissuespecific differences found in the original authors' studies of Treg cells across different organ sites. This suggests further experimental research into their development, differentiation, origin, and functional gene expression as an important direction, which we have supplemented in the discussion section of R1.

      Reviewer #3 (Public review):

      (1) Definition of Dual TCR and Validity of Doublet Removal:This study analyzes Treg cells with Dual TCR, but it is not clearly stated how the possibility of doublet cells was eliminated. The authors mention using DoubletFinder for detecting doublets in scRNA-seq data, but is this method alone sufficient?We strongly recommend reporting the details of doublet removal and data quality assessment in the Supplementary Data.

      Thank you very much for your review and suggestions. In the analysis of the shared scRNA+TCR-seq data across multiple laboratories, as you mentioned, this study employed the DoubletFinder R package to exclude suspected doublets. Additionally, we used the nCount values of individual cells (i.e., the total sequencing reads or UMI counts for each cell) as auxiliary parameters to further optimize the assessment of cell quality. Generally, due to the possibility that doublet cells may contain gene expression information from two or more cells, their nCount values are often abnormally high. In this study, all cells included in the analysis had nCount values not exceeding 20,000. Among the five tissue sample datasets, we further utilized hashtag oligonucleotide (HTO) labeling (where HTO labeling provides each cell with a unique barcode to differentiate cells from different tissue sources. By analyzing HTO labels, doublets and negative cells can be accurately identified) to eliminate doublets and negative cells.After the removal of chimeric cells, all samples exhibited T cells that possessed two or more TCR clones. This phenomenon validates the reliability of the methodological approach employed in this study and indicates that the analytical results accurately reflect the proportion of dual TCR T cells. Based on the recommendations of the reviewers, we have supplemented and clarified the methods and discussion sections in the manuscript. It is particularly noteworthy that in our analysis, the discussed dual TCR Treg cells and single TCR Treg cells specifically refer to those T cells that possess both functional α and β chains, which are capable of forming TCR. We have excluded from this analysis any Treg cells that possess only a single functional α or β chain and do not form TCR pairs, as well as those Treg cells in which the α or β chains involved in TCR pairing are non-functional.

      (2) In Figure 3D, the proportion of Dual TCR T cells (A1+A2+B1+B2) in the skin is reported to be very high compared to other tissues. However, in Figure 4C, the proportion appears lower than in other tissues, which may be due to contamination by non-Tregs. The authors should clarify why it was necessary to include non-Tregs as a target for analysis in this study. Additionally, the sensitivity of scRNA-seq and TCR-seq may vary between tissues and may also be affected by RNA quality and sequencing depth in skin samples, so the impact of measurement bias should be assessed.

      We deeply appreciate your review and constructive comments. Based on the original manuscript, we have further supplemented and elaborated on the uniqueness and relative proportions of double TCR T cell pairs in skin tissue samples in Section R1. Due to the scarcity of T cells in skin samples, we included some non-Treg cells during single-cell RNA sequencing and TCR sequencing to obtain a sufficient number of cells for effective analysis. The presence of non-regulatory T cells may indeed impact the statistical representation of double TCR T cells as well as the related comparative analyses, as noted by the reviewer. T cells with A1+A2+B1+B2 type double TCR pairings are primarily found within the non-regulatory T cell population in the skin. In response to this point, we have provided a detailed explanation of this analytical result in the revised manuscript R1. Furthermore, concerning the two datasets included in the study, we conducted a comparative analysis in R1, exploring how factors such as sequencing depth at different tissue sites might introduce biases in our findings, which we have thoroughly elaborated upon in the discussion section. We thank you once again for your valuable suggestions. 

      (3) Issue of Cell Contamination:In Figure 2A, the data suggest a high overlap between blood, kidney, and liver samples, likely due to contamination. Can the authors effectively remove this effect? If the dataset allows, distinguishing between blood-derived and tissue-resident Tregs would significantly enhance the reliability of the findings. Otherwise, it would be difficult to separate biological signals from contamination noise, making interpretation challenging.

      We thank you for your review and suggestions. We have carefully verified data sources for tissues such as blood, kidneys, and liver. In the study by Oliver T et al., various techniques were employed to differentiate between leukocytes from blood and those from tissues, ensuring accurate identification of leukocytes from tissue samples. First, anti-CD45 antibody was injected intravenously to label cells in the vasculature, verifying that analyzed cells were indeed resident in the tissue. Second, prior to dissection and cell collection, authors performed perfusion on anesthetized mice to reduce contamination of tissue samples by leukocytes from the vasculature. Additionally, during single-cell sequencing, authors utilized HTO technology to avoid overlap between cells from different tissues.

      Analysis of the scRNA+TCR-seq data shared by the original authors revealed highly overlapping TCR sequences in blood, kidney, and liver, despite distinct cell labels associated with each tissue. While these techniques minimize overlap of cells from different sources, they cannot completely rule out the potential impact of this technical issue. As suggested, we have provided additional clarification in R1 of the manuscript regarding this phenomenon of high overlap in the kidney, liver, and blood, indicating that the possibility of Treg migration from blood to kidney and liver cannot be entirely excluded.

      (4) Inconsistency Between CDR3 Overlap and TCR Diversity:The manuscript states that Single TCR Tregs have a higher CDR3 overlap, but this contradicts the reported data that Dual TCR Tregs exhibit lower TCR diversity (higher 1/DS score). Typically, when TCR diversity is low (i.e., specific clones are concentrated), CDR3 overlap is expected to increase. The authors should carefully address this discrepancy and discuss possible explanations.

      Thank you for your review and suggestions. Regarding the potential relationship between CDR3 overlap and TCR diversity, in samples with consistent sequencing depth, lower diversity indeed corresponds to a higher proportion of CDR3 overlap. In our analysis of scRNA+TCR-seq data, we found that single TCR Tregs exhibit both higher diversity and CDR3 overlap, seemingly presenting contradictory analytical results (i.e., dual TCR Tregs show lower TCR diversity and CDR3 overlap). In R1, we supplemented the analysis of possible reasons: the presence of multiple TCR chains in dual TCR Treg cells may lead to a higher uniqueness of CDR3 due to multiple rearrangements and selections, resulting in lower CDR3 overlap; the lower diversity of dual TCR Tregs may be related to the number of T cells sequenced in each sample. The CDR3 diversity analysis in this study merely suggests that the TCR composition of dual TCR Treg cells is diverse, similar to that of single TCR Tregs. However, the diversity indices of single TCR Tregs and dual TCR Tregs are not suitable for statistical comparative analysis. A more in-depth and specific analysis of the diversity and overlap of the VDJ recombination mechanisms and CDR3 composition in dual TCR Tregs during development will be an important technical means to elucidate the function of dual TCR Treg cells.

      (5) Functional Evaluation of Dual TCR Tregs:This study indicates gene expression differences among tissue-resident Dual TCR T cells, but there is no experimental validation of their functional significance. Including functional assays, such as suppression assays or cytokine secretion analysis, would greatly enhance the study's impact.

      We sincerely appreciate your review and suggestions: In this analysis of scRNA+TCR-seq data, we innovatively discovered a higher proportion of dual TCR Treg cells in different tissue sites, which exhibited differences in tissue characteristics. Furthermore, we conducted a comparative analysis of the homogeneity and heterogeneity between single TCR Treg and dual TCR Treg cells. This result provides a foundation for further research on the origin and characteristics of dual TCR Treg cells in different tissue sites, offering new insights for understanding the complexity and functional diversity of Treg cells. Based on your suggestions, we have supplemented R1 with the feasibility of further exploring the functions of tissue-resident dual TCR T cells and the necessity for potential application research.

      (6) Appropriateness of Statistical Analysis:When discussing increases or decreases in gene expression and cell proportions (e.g., Figure 2D), the statistical methods used (e.g., t-test, Wilcoxon, FDR correction) should be explicitly described. They should provide detailed information on the statistical tests applied to each analysis.

      Thank you for your review and suggestions: Based on the original manuscript, we have supplemented the specific statistical methods for the differences in cell proportions and gene expression in R1.

    1. Reviewer #1 (Public review):

      Summary:

      This work proposes a new approach to analyse cell-count data from multiple brain regions. Collecting such data can be expensive and time-intensive, so, more often than not, the dimensionality of the data is larger than the number of samples. The authors argue that Bayesian methods are much better suited to correctly analyse such data compared to classical (frequentist) statistical methods. They define a hierarchical structure, partial pooling, in which each observation contributes to the population estimate to more accurately explain the variance in the data. They present two case studies in which their method proves more sensitive in identifying regions where there are significant differences between conditions, which otherwise would be hidden.

      Strengths:

      The model is presented clearly, and the advantages of the hierarchical structure are strongly justified. Two alternative ways are presented to account for the presence of zero counts. The first involves the use of a horseshoe prior, which is the more flexible option, while the second involves a modified Poisson likelihood, which is better suited to datasets with a large number of zero counts, perhaps due to experimental artifacts. The results show a clear advantage of the Bayesian method for both case studies.<br /> The code is freely available, and it does not require a high-performance cluster to execute for smaller datasets. As Bayesian statistical methods become more accessible in various scientific fields, the whole scientific community will benefit from the transition away from p-values. Hierarchical Bayesian models are an especially useful tool that can be applied to many different experimental designs. However, while conceptually intuitive, their implementation can be difficult. The authors provide a good framework with room for improvement.

      Weaknesses:

      As with any Bayesian model, the choice of prior can significantly influence the results. The authors explain how the methodology can be adapted to different data properties, though selecting an appropriate prior or likelihood may not always be straightforward. They propose a 'standard workflow' as an alternative to traditional approaches, which could and should be used alongside established methods while Bayesian techniques continue to evolve and improve.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      “Alternative possibilities are discussed regarding the prior and likelihood of the model. Given that the second case study inspired the introduction of the zero-inflation likelihood, it is not clear how applicable the general methodology is to various datasets. If every unique dataset requires a tailored prior or likelihood to produce the best results, the methodology will not easily replace more traditional statistical analyses that can be applied in a straightforward manner. Furthermore, the differences between the results produced by the two Bayesian models in case study 2 are not discussed. In specific regions, the models provide conflicting results (e.g., regions MH, VPMpc, RCH, SCH, etc.), which are not addressed by the authors. A third case study would have provided further evidence for the generalizability of the methodology.”

      We hope in this paper to propose a ‘standard workflow’ for these data; this standard workflow uses the horseshoe prior and we propose that this is the approach used to describe cell count data instead of the better established, but to our thinking, inefficient, t-testing approach.

      The horseshoe prior is robust and allows a partially-pooled model to used while weighing-up the contribution of different data points. This is an analogue of excluding outliers and, in any analysis it is normal to investigate further if there are points being excluded as outliers. Often this reveals a particular challenge with the data, in the case of the data here, there are a lot of zeros, indicating that some samples should be excluded because the preparation failed to tag cells rather than because there were no cells to tag. This idea behind the ZIP example is to show that the Bayesian method can allow for this sort of further investigation and, indeed, as the reviewer notes this sort of extended analysis is often bespoke, tailored to the data.

      We have clearly failed to explain that the ‘standard workflow’ we propose replace the more traditional methods is the first one we describe, with the horseshoe prior; this produces better results on both datasets than the traditional approach. However, we also feel it is useful to show how a more tailored follow-on can be useful; we need to make it clear that this is intended as an illustration of an ‘optional extra’ rather than a part of the more straightforward ‘standard workflow’.

      To make this clearer we have made altered the text in several locations:

      • end of Introduction: added clarifying sentence “Here, our aim is to introduce a ‘standard’ Bayesian model for cell count data. We illustrate the application of this model to two datasets, one related to neural activation and the other to developmental lineage. For the second dataset, we also demonstrate a second example extension Bayesian model.”

      • Section Hierarchical modeling: “Our goal in both cases is to quantify group differences in the data. We present a ‘standard’ hierarchical model. This model reflects the experimental features common to cell count experiments and reflects the hierarchical structure of cell count data; the standard model is designed to deal robustly and efficiently with noise. On some occasions, to reflect a specific hypotheses, the structure of a particular experiment or an observed source of noise, this model can be further refined or changed to target the analysis. We will give an example of this for our second dataset.”

      • Section Horseshoe prior: “The alternative is via a flexible prior such as the horseshoe Carvalho et al., 2010; Piironen and Vehtari, 2017. This more generic option may be suitable as a default ‘standard’ approach in the typical case where outliers are poorly understood.”

      • Discussion: word ‘standard’ added to sentence: “Our standard workflow uses a horseshoe prior, along with the partial pooling, this allows our model to deal effectively with outliers.”

      • Discussion: modified sentence “The horseshoe prior model workflow we have exhibited here is intended as a standard approach.”

      Indeed, because the horseshoe prior deals robustly with outliers, whereas the ZIP is intended to model the outliers, any substantial difference between the two should be examined carefully. The referee is right to point out that we have not explained this in any detail and has helpfully listed a few brain regions were there are differences. This is useful, particularly since the examples listed illustrate in a useful way the opportunities and hazards this sort of data presents. To address this, we have added a new version of Figure 6 to the revised manuscript

      Previously Figure 6 showed two example brain regions: MPN and TMd. We have now added MH and SCH to the figure, and new text commenting on the insights the plots provide, both in the Results and Discussion.

      Reviewer #2 (Public review):

      “A clearer link between the experimental data and model-structure terminology would be a benefit to the non-expert reader.”

      This is a very good point and we are acutely aware through our own work how difficult it can be moving between fields with different research goals, different scientific cultures and different technical vocabularies. Just as it can be difficult translating from one language to another without losing nuance and meaning, it can be a real challenge finding technical terms that are useful for the non-expert reader while retaining the precision the application requires! In the long run, we hope that, just as some of the very specialized vocabulary that surrounds frequentist statistics has become familiar to to the working experimental scientists, the precise terminology involved in Bayesian modelling will become familiar and transparent. However, in advance of that day, we have included a glossary of terms at the end of the main text, and have made numerous small tweaks to make sure that link between data and model terminology is clearer and better explained.

      Reviewer #1 (Recommendations fro the authors):

      (1) “I would strongly recommend that the authors include more case studies in the manuscript, and address the qualitative differences between the different versions of the model.”

      We agree that our method will only become established when it is applied to more datasets, we hope to contribute to further analysis and we know other people are already using the approach on their own data. We do, however, feel that adding more datasets to this paper will make it longer and more complex; the plan, instead, is to use the method on novel datasets to test specific hypotheses, so that the results will include novel scientific findings as well as adding another illustration of the Bayesian approach applied to data that is already well studied.

      (2) “Figure 6 is not discussed in the main text.”

      We had discussed the results presented in Figure 6 in the second paragraph of the section “Case study two – Ontogeny of inhibitory interneurons of the mouse thalamus”, however the reviewer is right in that we did not directly refer to the Figure – this was an oversight. In any case, in the revised manuscript we present a new version of Figure 6 (in response to above comment), which is now explicitly cited in the text.

      Revised Figure 6: Example data and inferences highlighting model discrepancies. On the left under ‘data’: boxplots with medians and interquartile ranges for the raw data for four example brain regions. The shape of each point pairs left and right hemisphere readings in each of the five animals. On the right under ‘inference’: HDIs and confidence intervals are plotted. Purple is the Bayesian horseshoe model, pink is the Bayesian ZIP model, and orange is the sample mean. The Bayesian estimates are not strongly influenced by the zero-valued observations (MPN, SCH, TMd) or large-valued outliers (MH) and have means close to the data median. This explains the advantage of the Bayesian results over the confidence interval.

      Reviewer #2 (Recommendations from the authors):

      (1) “This is a generally well-written methodology paper that also provides the underlying code as a resource. As a reviewer outside both cell-count modelling and hierarchical-Bayesian approaches (though with a general interest in the topics) I found the method a little difficult to follow and would have liked to have been left with a better understanding of how the method is applied to the data. For example, in Figure 1 we are introduced to brain region count, animal count, and “items”. Then in the next line: pooling, model, structure, population and etc in subsequent lines. It is not clear what the subscripts (the pools?) are referring to: are they different regions R or animals N? These terms need to be better linked to the data and/or trimmed. Having said that, the later results look like a solid contribution to the field with a significant reduction in uncertainty from the Bayesian approach over the frequentist one. A future version of the manuscript, therefore, would benefit from greater precision of language as well as an economy and greater focus of terms linking the method to the biology. This is particularly the case around the exposition parts in Figure 1, Figure 2, and the “Hierarchical modelling” section.”

      This is another important point. We have now made numerous small changes to tighten up the text in the paper, in response to both this point and the next point.

      (2) “Language throughout could be sharpened. Subjectivity like “surprising outliers” could be removed and quirky grammar like “often small, ten is a typical” improved. There are also typos “an rate” etc that should be tidied up.”

      As per previous response, we have made numerous tweaks and small improvements and feel that the paper is stronger in this respect.

      (3) “Figure 1 caption. “It is a spectrum that depends” Is spectrum the right word here? Also, “thicker stroke” what does this refer to? Wasn’t immediately clear. In A, why is the whole animal within the R bracket that signifies brain regions, and then the brain regions are within the N bracket that signifies whole animals? Apart from the teal colouring, what are the other coloured regions in the image referring to? Improving this first figure would greatly help a reader unfamiliar with the context of the approach.”

      We have replaced the word “spectrum” with “continuum”. We have replaced “ Observed quantities have been highlighted with a thicker stroke in the graphical model.” with “The observed data quantities, y<sub>i</sub> to y<sub>n</sub>, are highlighted with a thick line in the model diagrams”. We have added the following text to describe the red and green lines in panel A: “green and red lines indicate regions labeled as damaged”.

      (4) “On P2 there is no discussion of priors when running through the advantage of the Bayesian approach. Is this a choice or an oversight? Priors do have a role in the later analysis.”

      A short additional paragraph has been added to the introduction outlining the advantage of having a prior, but also noting that the obligation to pick a prior can be intimidating and that suggesting priors is one of the contributions of our paper: “A Bayesian model also includes a set of probability distributions, referred to as the prior, which represent those beliefs it is reasonable to hold about the statistical model parameters before actually doing the experiment. The prior can be thought of as an advantage, it allows us to include in our analysis our understanding of the data based on previous experiments. The prior also makes explicit in a Bayesian model assumptions that are often implicit in other approaches. However, having to design priors is often considered a challenge and here we hope to make this more straightforward by suggesting priors that are suitable for this class of data.”

      (5) “On P4 more explanation would help greatly. Formulas like 23*10*4 or 50*6+50*4 are presented without explanation. What are the various numbers being multiplied? Regions, animals? Again, a clearer link between biological data and model structure would be advantageous.”

      We have now modified this line to clearly state the numbers’ sources: “The index i runs over the full set of samples, which in this case comprises 23 brain regions ×10 animals ×4 groups ≈920 datapoints in the first study, and 50 brain regions × 6 HET animals + 50 brain regions × 4 KO animals ≈500 datapoints in the second.”

      (6) “P6 and Results. Is it possible to show examples of the data set sampled from? Perhaps an image or two for the two experiments. Both Figures 4 and 5 as they currently are could be made slightly smaller to provide space for a small explanatory sub-panel. This would help ground the results.”

      This is a good idea. We have now added heatmap visualisations of both entire datasets to revised versions of Figures 4 and 5 (assuming that this is what the reviewer was suggesting).

    1. Law of the Instrument

      This applies to the efficiency aspect of UX because it is important to make sure that users are getting what they need done with the correct tools. For example, VS Code has auto-completion when writing code, which can make completing it much faster than writing code on Notepad. However, there can be certain projects where other text editors are needed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors developed a sequence-based method to predict drug-interacting residues in IDP, based on their recent work, to predict the transverse relaxation rates (R2) of IDP trained on 45 IDP sequences and their corresponding R2 values. The discovery is that the IDPs interact with drugs mostly using aromatic residues that are easy to understand, as most drugs contain aromatic rings. They validated the method using several case studies, and the predictions are in accordance with chemical shift perturbations and MD simulations. The location of the predicted residues serves as a starting point for ligand optimization.

      Strengths:

      This work provides the first sequence-based prediction method to identify potential druginteracting residues in IDP. The validity of the method is supported by case studies. It is easy to use, and no time-consuming MD simulations and NMR studies are needed.

      Weaknesses:

      The method does not depend on the information of binding compounds, which may give general features of IDP-drug binding. However, due to the size and chemical structures of the compounds (for example, how many aromatic rings), the number of interacting residues varies, which is not considered in this work. Lacking specific information may restrict its application in compound optimization, aiming to derive specific and potent binding compounds.

      We fully recognize that different compounds may have different interaction propensity profiles along the IDP sequence. In future studies, we will investigate compound-specific parameter values. The limiting factor is training data, but such data are beginning to be available.

      Reviewer #2 (Public review):

      Summary:

      In this work, the authors introduce DIRseq, a fast, sequence-based method that predicts druginteracting residues (DIRs) in IDPs without requiring structural or drug information. DIRseq builds on the authors' prior work looking at NMR relaxation rates, and presumes that those residues that show enhanced R2 values are the residues that will interact with drugs, allowing these residues to be nominated from the sequence directly. By making small modifications to their prior tool, DIRseq enables the prediction of residues seen to interact with small molecules in vivo.

      Strengths:

      The preprint is well written and easy to follow

      Weaknesses:

      (1) The DIRseq method is based on SeqDYN, which itself is a simple (which I do not mean as a negative - simple is good!) statistical predictor for R2 relaxation rates. The challenge here is that R2 rates cover a range of timescales, so the physical intuition as to what exactly elevated R2 values mean is not necessarily consistent with "drug interacting". Presumably, the authors are not using the helix boost component of SeqDYN here (it would be good to explicitly state this). This is not necessarily a weakness, but I think it would behove the authors to compare a few alternative models before settling on the DIRseq method, given the somewhat ad hoc modifications to SeqDYN to get DIRseq.

      Actually, the factors that elevate R2 are well-established. These are local interactions and residual secondary structures (if any). The basic assumption of our method is that intra-IDP interactions that elevate R2 convert to IDP-drug interactions. This assumption was supported by our initial observation that the drug interaction propensity profiles predicted using the original SeqDYN parameters already showed good agreement with CSP profiles. We only made relatively small adjustments to the parameters to improve the agreement. Indeed we did not apply the helix boost portion of SeqDYN to DIRseq, and now state as such (p. 4, second last paragraph). We now also compare DIRseq with several alternative models, as summarized in new Table S2.

      Specifically, the authors previously showed good correlation between the stickiness parameter of Tesei et al and the inferred "q" parameter for SeqDYN; as such, I am left wondering if comparable accuracy would be obtained simply by taking the stickiness parameters directly and using these to predict "drug interacting residues", at which point I'd argue we're not really predicting "drug interacting residues" as much as we're predicting "sticky" residues, using the stickiness parameters. It would, I think, be worth the authors comparing the predictive power obtained from DIRseq with the predictive power obtained by using the lambda coefficients from Tesei et al in the model, local density of aromatic residues, local hydrophobicity (note that Tesei at al have tabulated a large set of hydrophobicity scores!) and the raw SeqDYN predictions. In the absence of lots of data to compare against, this is another way to convince readers that DIRseq offers reasonable predictive power.

      We now compare predictions of these various parameter sets, and report the results in Table S2.  In short, among all the tested parameter sets, DIRseq has the best performance as measured by (1) strong correlations between prediction scores and CSPs and (2) high true positives and low false positives (p. 7-9).

      (2) Second, the DIRseq is essentially SeqDYN with some changes to it, but those changes appear somewhat ad hoc. I recognize that there is very limited data, but the tweaking of parameters based on physical intuition feels a bit stochastic in developing a method; presumably (while not explicitly spelt out) those tweaks were chosen to give better agreement with the very limited experimental data (otherwise why make the changes?), which does raise the question of if the DIRseq implementation of SeqDYN is rather over-parameterized to the (very limited) data available now? I want to be clear, the authors should not be critiqued for attempting to develop a model despite a paucity of data, and I'm not necessarily saying this is a problem, but I think it would be really important for the authors to acknowledge to the reader the fact that with such limited data it's possible the model is over-fit to specific sequences studied previously, and generalization will be seen as more data are collected.

      We have explained the rationale for the parameter tweaks, which were limited to q values for four amino-acid types, i.e., to deemphasize hydrophobic interactions and slightly enhance electrostatic interactions (p. 4-5). We now add that these tweaks were motivated by observations from MD simulations of drug interactions with a-syn (ref 13). As already noted in the response to the preceding comment, we now also present results for the original parameter values as well as for when the four q values are changed one at a time.

      (3) Third, perhaps my biggest concern here is that - implicit in the author's assumptions - is that all "drugs" interact with IDPs in the same way and all drugs are "small" (motivating the change in correlation length). Prescribing a specific length scale and chemistry to all drugs seems broadly inconsistent with a world in which we presume drugs offer some degree of specificity. While it is perhaps not unexpected that aromatic-rich small molecules tend to interact with aromatic residues, the logical conclusion from this work, if one assumes DIRseq has utility, is that all IDRs bind drugs with similar chemical biases. This, at the very least, deserves some discussion.

      The reviewer raises a very important point. In Discussion, we now add that it is important to further develop DIRseq to include drug-specific parameters when data for training become available (p. 12-13). To illustrate this point, we use drug size as a simple example, which can be modeled by making the b parameter dependent on drug molecule size.

      (4) Fourth, the authors make some general claims in the introduction regarding the state of the art, which appear to lack sufficient data to be made. I don't necessarily disagree with the author's points, but I'm not sure the claims (as stated) can be made absent strong data to support them. For example, the authors state: "Although an IDP can be locked into a specific conformation by a drug molecule in rare cases, the prevailing scenario is that the protein remains disordered upon drug binding." But is this true? The authors should provide evidence to support this assertion, both examples in which this happens, and evidence to support the idea that it's the "prevailing view" and specific examples where these types of interactions have been biophysically characterized.

      We now cite nine studies showing that IDPs remain disordered upon drug binding.

      Similarly, they go on to say:

      "Consequently, the IDP-drug complex typically samples a vast conformational space, and the drug molecule only exhibits preferences, rather than exclusiveness, for interacting with subsets of residues." But again, where is the data to support this assertion? I don't necessarily disagree, but we need specific empirical studies to justify declarative claims like this; otherwise, we propagate lore into the scientific literature. The use of "typically" here is a strong claim, implying most IDP complexes behave in a certain way, yet how can the authors make such a claim? 

      Here again we add citations to support the statement.

      Finally, they continue to claim:

      "Such drug interacting residues (DIRs), akin to binding pockets in structured proteins, are key to optimizing compounds and elucidating the mechanism of action." But again, is this a fact or a hypothesis? If the latter, it must be stated as such; if the former, we need data and evidence to support the claim.

      We add citations to both compound optimization and mechanism of action.

      Reviewer #1 (Recommendations for the authors):

      (1) The authors should compare the sequences of the IDPs in the case studies with the 45 IDPs in training the SeqDYN model to make sure that they are not included in the training dataset or are highly homologous.

      Please note that the data used for training SeqDYN were R2 rates, which are independent of the property being studied here, i.e., drug interacting residues. Therefore whether the IDPs studied here were in the training set for SeqDYN is immaterial.

      (2) The authors manually tuned four parameters in SeqDYN to develop the model for predicting drug-interacting residues without giving strict testing or explanations. More explanations, testing of more values, and ablation testing should be given.

      As responded above, we now both expand the explanation and present more test results.

      (3) The authors changed the q values of L, I, and M to the value of V. What are the results if these values are not changed?

      These results are shown in Table S2 (entry named SeqDYN_orig).

      (4) Only one b value is chosen based on the assumption that a drug molecule interacts with 3-4 residues at a time. However, the number of interacting residues is related to the size of the drug molecule. Adjusting the b value with the size of the ligand may provide improvement. It is better to test the influence of adjusting b values. At least, this should be discussed.

      Good point! We now state that b potentially can be adjusted according to ligand size (p. 12-13). In addition, we also show the effect of varying b on the prediction results (Table S2; p. 8, last paragraph).

      (5) The authors add 12 Q to eliminate end effects. However, explanations on why 12 Qs are chosen should be given. How about other numbers of Q or using other residues (e.g., the commonly used residues in making links, like GS/PS or A?

      As we already explained, “Gln was selected because its 𝑞 value is at the middle of the 20 𝑞 values.” (p. 5, second paragraph). Also, 12 Qs are sufficient to remove any end effects; a higher number of Qs does not make any difference.

      Reviewer #2 (Recommendations for the authors):

      (1) The authors make reference to the "C-terminal IDR" in cMyc, but the region they note is found in the bHLH DNA binding domain (which falls from residue ~370-420).

      We now clarify that this region is disordered on its own but form a helix-loop-loop structure upon heterodimerization with Max (p. 11, last paragraph).

      (2) Given the fact that X-seq names are typically associated with sequencing-based methods, it's perhaps confusing to name this method DIRseq?

      We appreciate the reviewer’s point, but by now the preprint posted in bioRxiv is in wide circulation, and the DIRseq web server has been up for several months, so changing its name would cause a great deal of confusion.

      (3) I'd encourage the authors just to spell out "drug interacting residues" and retain an IDR acronym for IDRs. Acronyms rarely make writing clearer, and asking folks to constantly flip between IDR and DIR is asking a lot of an audience (in this reviewer's opinion, anyway).

      The reviewer makes a good point; we now spell out “drug-interacting residues”.

      (4) The assumption here is that CSPs result from direct drug:IDR interactions. However, CSPs result from a change in the residue chemical environment, which could in principle be an indirect effect (e.g., in the unbound state, residues A and B interact; in the bound state, residue A is now free, such that it experiences a CSP despite not engaging directly). While I recognize such assumptions are commonly made, it behoves the authors to explicitly make this point so the reader understands the relationship between CSPs and binding.

      We did add caveats of CSP in Introduction (p. 3, second paragraph).

      (5) On the figures, please label which protein is which figure, as well as provide a legend for the annotations on the figures (red line, blue bar, cyan region, etc.)

      We now label protein names in Fig. 1. For annotation of display items, it is also made in the Figs. 2 and 3 captions; we now add it to the Fig. 4 caption.

      (6) abstract: "These successes augur well for deciphering the sequence code for IDP-drug binding." - This is not grammatically correct, even if augur were changed to agree. Suggest rewriting.

      “Augur well” means to be a good sign (for something). We use this phrase here in this meaning.

      (6) page 5: "we raised the 𝑞 value of Asp to be the same as that of Glu" → suggested "increased" instead of raised.

      We have made the suggested change.

      (7) The authors should consider releasing the source code (it is available via the .js implementation on the server, but this is not very transferable/shareable, so I'd encourage the authors to provide a stand-alone implementation that's explicitly shareable).

      We have now added a link for the user to download the source code.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The present work studies the coevolution of HIV-1 and the immune response in clinical patient data. Using the Marginal Path Likelihood (MPL) framework, they infer selection coefficients for HIV mutations from time-series data of virus sequences as they evolve in a given patient.

      Strengths:

      The authors analyze data from two human patients, consisting of HIV population sequence samples at various points in time during the infection. They infer selection coefficients from the observed changes in sequence abundance using MPL. Most beneficial mutations appear in viral envelop proteins. The authors also analyze SHIV samples in rhesus macaques, and find selection coefficients that are compatible with those found in the corresponding human samples.

      Weaknesses:

      The MPL method used by the authors considers only additive effects of mutations, thus ignoring epistasis.

      As suggested, we have now addressed this limitation by inferring epistatic fitness landscapes for CH505, CH848, SHIV.CH505, and SHIV.CH848. Indeed, the computational burden of the epistasis inference procedure was one constraint that motivated us to consider only additive fitness in the previous version of our paper. The original approach developed by Sohail et al. (2022) tested only sequences with <50 sites due to this limitation, far smaller than the ones we consider. Beyond this computational constraint, we also believed that 1) an additive fitness model may suffice to capture local fitness landscapes, and practically, 2) epistatic interactions are more challenging to validate than the effects of individual mutations, making the interpretation of the model more complex.

      However, after performing the analyses described in this paper, we developed a new approach for identifying epistatic interactions that can scale to much longer sequences (Shimagaki et al., Genetics, in press). We therefore applied this method to infer an epistatic fitness landscape for the HIV and SHIV data sets that we studied. As in that work, we focused on short-range (<50 bp) interactions which we could more confidently estimate from data. We have added a section in the SI describing the epistatic fitness model and our analysis. 

      Overall, we found substantial agreement between the epistatic and purely additive models in terms of the estimated fitness effects of individual mutations (new Supplementary Fig. 8) and overall fitness (Supplementary Fig. 9). Consistent with our prior work, we did not find substantial evidence for very strong epistatic interactions (Supplementary Fig. 10). This does not necessarily mean that strong epistatic interactions do not exist; rather, this shows that strong interactions don’t substantially improve the fit of the model to data, and thus many are regularized toward zero. While the biological validation of epistatic interactions is challenging, we found that the largest epistatic interactions, which we defined as the top 1% of all shortrange interactions, were modestly but significantly enriched in the CD4 binding site, V1 and V5 regions for CH505 and in the CD4 binding site, V4, and V5 for CH848. In addition, mutation pairs N280S/V281A and E275K/V281G, which confer resistance to CH235, ranked in the top 15% of all epistatic interactions in CH505.

      We have now included an additional section in the Results, “Robustness of inferred selection to changes in the fitness model and finite sampling”, which discusses our epistatic analyses (page 6, lines 415-464), along with the above Supplementary Figures and a technical section in the SI summarizing the epistasis inference approach.

      Although the evolution of broadly neutralizing antibodies (bnAbs) is a motivating question in the introduction and discussion sections (and the title), the relevance of the analysis and results to better understanding how bnAbs arise is not clear. The only result presented in direct connection to bnAbs is Figure 6.

      It is true that, while bnAb development is a major motivator of our study, our analysis focuses on HIV-1 and does not directly consider antibody evolution. We have now brought attention to this point as a limitation directly in the Discussion. Following the suggestion below in the “Recommendations for the authors,” we have edited our manuscript to place more emphasis on viral fitness and somewhat reduce the emphasis on bnAbs, though this remains an important motivating factor. Specifically, the Abstract now begins

      Human immunodeficiency virus (HIV)-1 evolves within individual hosts to escape adaptive immune responses while maintaining its capacity for replication. Coevolution between the HIV-1 and the immune system generates extraordinary viral genetic diversity. In some individuals, this process also results in the development of broadly neutralizing antibodies (bnAbs) that can neutralize many viral variants, a key focus of HIV-1 vaccine design. However, a general understanding of the forces that shape virusimmune coevolution within and across hosts remains incomplete. Here we performed a quantitative study of HIV-1 evolution in humans and rhesus macaques, including individuals who developed bnAbs.

      We have similarly modified the Discussion to focus first on viral fitness. In response to comments from Reviewer 3, we have also more clearly articulated how our work might contribute to the understanding of bnAb development in the Discussion.

      Questions or suggestions for further discussion:

      I list here a number of points for which I believe the paper would benefit if additional discussion/results were included.

      The MPL method used by the authors considers only additive effects of mutations, thus ignoring epistasis. In Sohail et al (2022) MBE 39(10), p. msac199  (https://doi.org/10.1093/molbev/msac199) an extension of MPL is developed allowing one to infer epistasis. Can the authors comment on why this was not attempted here?

      I presume one possible reason is that epistasis inference requires considerably more computational effort (and more data). However, since the authors find most beneficial mutations occurring in Env, perhaps restricting the analysis to Env genes only (e.g. the trimer shown in Figure 2) can lead to tractable inference of epistasis within this segment (instead of the full genome).

      As described above, we have now addressed this comment by inferring epistatic fitness landscapes for the data sets that we consider. Our overall results using the epistatic fitness model are consistent with the ones that we previously obtained with an additive model.

      Do the authors find correlations in the inferred selection coefficients of the two samples CH505 and CH848? I could not find any discussion of this in the manuscript. Only correlations between Humans and RM are discussed.

      To address this question, we compared the fitness values and individual selection coefficients across CH505 and CH848 data sets. We found little correlation between CH505 and CH848 fitness values (shown in a new Supplementary Fig. 6) or selection coefficients. We found only 199 common mutations between HIV-1 amino acid sequences from CH505 and CH848 out of 868 and 1,406 total mutations, respectively. Thus, we were not surprised to find no strong relationship between fitness estimates from CH505 and CH848 data sets. 

      Reviewer #2 (Public review):

      Summary:

      This paper combines a biological topic of interest with the demonstration of important theoretical/methodological advances. Fitness inference is the foundation of the quantitative analysis of adapting systems. It is a hard and important problem and this paper highlights a compelling approach (MPL) first presented in (1) and refined in (2), roughly summarized in equation 12.

      (1) Sohail, M. S., Louie, R. H., McKay, M. R. & Barton, J. P. Mpl resolves genetic linkage in fitness inference from complex evolutionary histories. Nature biotechnology 39, 472-479 (2021).

      (2) Shimagaki, K. & Barton, J. P. Bézier interpolation improves the inference of dynamical models from data. Physical Review E 107, 024116 (2023).

      The authors find that positive selection shapes the variable regions of env in shared patterns across two patient donors. The patterns of positive selection are interesting in and of themselves, they confirm the intuition that hyper-variation in env is the result of immune evasion rather than a broadly neutral landscape (flatness). They show that the immune evasion patterns due to CD8 T and naive B-cell selection are shared across patients. Furthermore, they suggest that a particular evolutionary history (larger flux to high fitness states) is associated with bNAb emergence. Mimicking this evolutionary pattern in vaccine design may help us elicit bNAbs in patients in the future.

      There is a lot of information to be found in the full fitness landscape of env. The enormous strength of reversion-to-consensus in the patterns is a known pattern of HIV post-infection populations but they are nicely quantified here. Agreement between SHIV and HIV evolution is shown. They find selection is larger for autologous antibodies than the bNAbs themselves (perhaps bNAbs are just too small a component of the host response to drive the bulk of selection?), and that big fitness increases precede antibody breadth in rhesus macaques, suggesting that this fitness increase is the immune challenge required to draw forth a bNAb. This is all of high interest to HIV researchers.

      Strength of evidence:

      One limitation is, of course, that the fitness model is constant in time when the immune challenge is variable and changing. This simplification may complicate some interpretations.

      We agree that this is a limitation of our current approach. In prior work, we have found that the constant fitness effects of mutations that we infer typically reflect the time-averaged fitness effect when the selection changes over time (Gao and Barton, PNAS 2025; Lee et al., Nat Commun 2025). It could be difficult, however, to capture changes in selection that fluctuate rapidly with underlying immune responses. We have added a new paragraph in the Discussion that more clearly sets out some of the limitations of our analysis, including our assumption of constant selection coefficients.

      There are additional methodological and technical limitations that should be considered in the interpretation of our results. Most notably, we assume that the viral fitness landscape is static in time. While we do not expect selection for effective replication (“intrinsic” fitness) to change substantially over time, pressure for immune escape could vary along with the immune responses that drive them. In prior work, we have found that constant selection coefficients typically reflect the average fitness effect of a mutation when its true contribution to fitness is time-varying [42,43]. This may not adequately description mutational effects that undergo large or rapid shifts in time. Future work should also examine temporal patterns in selection for individual mutations.

      Equation 12 in the methods is really a beautiful tool because it is so simple, but accounts for linkage and can be solved precisely even in the presence of detailed mutational and selection models. However, the reliance on incomplete observations of the frequency leads to complications that must be carefully (re)addressed here.

      For instance, the consistent finding of strong selection in hypervariable regions is biologically intuitive but so striking, that I worry that it might be the result of a bias for selection in high entropy regions. 

      Thank you for this suggestion. We agree that it is important to carefully interrogate these results. To assess the effects of general sequence variability on inferred selection, we first computed a position-specific entropy measure, H<sub >i</sub >, for each site i. We first defined the time-dependent entropy H<sub >i</sub >(t) = - ∑<sub >a</sub> x<sub>i</sub> (a, t) log x<sub>i</sub> (a, t)), where x<sub>i</sub> (a, t) represents the frequency of amino acid/nucleotide a at position i and time t, at each sample time. We then computed H<sub>i</sub> as the average of H<sub>i</sub>(t) across all sample times. A new Supplementary Fig. 1 plots the entropy against the inferred selection coefficients. Although some sequence variation must be observed in order for us to infer that a mutation is beneficial, we did not find a systematic bias toward larger (more beneficial) selection coefficients at more variable sites. Overall, we found only a modest correlation between inferred selection coefficients and entropy (Pearson’s r = 0.33 and 0.29 for CH505 and CH848, respectively), which appears to be partly driven by the tendency for mutations inferred to be significantly deleterious to occur at sites with low entropy. In addition to the new Supplementary Figure, we have added a reference to this analysis in the main text:

      To test whether our results might be biased by overall sequence variability, we examined the relationship between our inferred selection coefficients and entropy, a common measure of sequence variability. Overall, we found only a modest correlation between selection and entropy, suggesting that the signs of selection that we observe are not due to increased sequence variability alone (Supplementary Fig. 1).

      Mutational and covariance terms in equation 12 might be underestimated, due to finite sampling effect in highly diverse populations. Sampling effects lead to zeros in x(t) when actual frequency zeros might be rare at the population sizes of HIV viral loads and mutation rates. Both mutational flux and C underestimation will bias selection upward in eq. 12. 

      The prior papers (1) and (2) seem to show robustness to finite sampling effects, but, again, more care needs to be shown that this robustness transfers to the amino acid inference under these conditions. That synonymous sites are rarely selected for in the nucleotide level is a good sign, and it may be a matter of simply fully explaining the amino-acid level model.

      As above, we agree that these tests are important. To assess the robustness of our results to finite sampling, we performed bootstrap sampling on the viral sequences and inferred selection coefficients using the resampled sequences. Specifically, we resampled the same number of sequences as in the original data at each time point and repeated this for all time points across all HIV-1 and SHIV data sets. A new Supplementary Fig. 11 shows a typical comparison of the original selection coefficients vs. those obtained through bootstrap resampling. Overall, we observe a high degree of consistency between the selection coefficients in each case, which is surely aided by the long time series in these data sets. As pointed out by the reviewer, uncertainty in low-frequency mutations is a particular concern, though the effects on inferred selection are mitigated by regularization. 

      We have added a section in the Results, “Robustness of inferred selection to changes in the fitness model and finite sampling”, which includes this analysis:

      Finite sampling of sequence data could also affect our analyses. To further test the robustness of our results, we inferred selection coefficients using bootstrap resampling, where we resample sequences from the original ensemble, maintaining the same number of sequences for each time point and subject. The selection coefficients from the bootstrap samples are consistent with the original data (see Supplementary Fig. 11), with Pearson’s r values of around 0.85 for HIV-1 data sets and 0.95 for SHIV data sets, respectively.

      Uncertainty propagates to the later parts of the paper, eg. HIV and SIV shared patterns might be the result of shared biases in the method application. However, this worry does not extend to the apples-to-apples comparison of fitness trajectories across individuals (Figures 5 and 6) which I think are robust (for these sample sizes). 

      One way to address this uncertainty is to compare the fitness values and individual selection coefficients across CH505 and CH848 data sets, which was also requested by Reviewer 1. Overall, we found little correlation between CH505 and CH848 fitness values (shown in a new Supplementary Fig. 6) or selection coefficients. This suggests that similarities between HIV-1 and SHIV landscapes are not solely determined by potential biases in the inference approach. We have now added a reference to this point in the main text:

      In contrast, the inferred fitness landscapes of CH505 and CH848, which share few mutations in common, are poorly correlated (Supplementary Fig. 6). This suggests that the similarities between viral fitness values in humans and RMs are not artifacts of the model, but rather stem from similarities in underlying evolutionary drivers.

      The timing evidence is slightly weakened by the fact that bNAb detection is different from bNAb presence and the possibility that fitness increases occurred after the bNAbs appeared remains. Still, their conclusion is plausible and fits in with the other observations which form a coherent and compelling picture.

      Yes, we agree that this is a limitation of our analysis — bNAbs may have been present at low levels before they were detected, and we cannot definitively reject selection by bNAbs. Nonetheless, in at least one case (RM5695), rapid fitness gains were substantially separated in time from bNAb detection (roughly 2 weeks after infection vs. 16 weeks, respectively). We have now added this point in a new paragraph in the Discussion:

      While we found a strong relationship between viral fitness dynamics and the emergence of bnAbs, it may not be true that the former stimulates the latter. For example, bnAbs may have been present within each host before they were experimentally detected. Rapid viral fitness gains within hosts that developed broad antibody responses could then have been driven by undetected bnAb lineages. However, we did not find strong selection for known bnAb resistance mutations, and in at least one case (RM5695), rapid fitness gains (roughly 2 weeks after infection) substantially preceded bnAb detection (16 weeks). Still, given the limited size of the data set that we studied, it is unclear the extent to which our results will transfer to larger and broader data sets.

      Overall thisrpretations could provide valuable insights into the broader significance of these results. is a convincing paper, part of a larger admirable project of accurately inferring complete fitness landscapes.

      Reviewer #3 (Public review):

      Summary:

      Shimagaki et al. investigate the virus-antibody coevolutionary processes that drive the development of broadly neutralizing antibodies (bnAbs). The study's primary goal is to characterize the evolutionary dynamics of HIV-1 within hosts that accompany the emergence of bnAbs, with a particular focus on inferring the landscape of selective pressures shaping viral evolution. To assess the generality of these evolutionary patterns, the study extends its analysis to rhesus macaques (RMs) infected with simianhuman immunodeficiency viruses (SHIV) incorporating HIV-1 Env proteins derived from two human individuals.

      Strengths:

      A key strength of the study is its rigorous assessment of the similarity in evolutionary trajectories between humans and macaques. This cross-species comparison is particularly compelling, as it quantitatively establishes a shared pattern of viral evolution using a sophisticated inference method. The finding that similar selective pressures operate in both species adds robustness to the study's conclusions and suggests broader biological relevance.

      Weaknesses:

      However, the study has some limitations. The most significant weakness is that the authors do not sufficiently discuss the implications of the observed similarities. While the identification of shared evolutionary patterns (e.g., Figure 5) is intriguing, the study would benefit from a more explicit discussion of what these findings mean for instance, in the context of HIV vaccine design, immunotherapy, or fundamental viral-host interactions. Even speculative inte

      Thank you for this suggestion. We have now clarified the potential implications of our work in several areas. While speculative, one possible application is in vaccine design: it may be beneficial to design sequential immunogens to mimic the patterns of viral evolution associated with rapid fitness gains. This “population-based” design principle is different from typical approaches, which have focused on molecular details of virus surface proteins. 

      We have extended our discussion of our results in the context of viral evolution within and across hosts and related host species. Overall, our work suggests that there may be relatively few paths to significantly higher viral fitness in vivo. Evolutionary “contingencies” such as shifting immune pressure or epistatic interactions could influence the direction of evolution, but not so dramatically that the dynamics that we see in different hosts are not comparable. We have also connected our work more broadly to the literature in evolutionary parallelism in HIV-1 in different contexts.

      A secondary, albeit less critical, limitation is the placement of methodological details in the Supplementary Information. While it is understandable that the authors focus on results in the main text - especially since the methodology is not novel and has been previously described in earlier publications - some readers might benefit from a more thorough presentation of the method within the main paper.

      We have now modified the main text to add a new section, “Model overview,” that lays out the key steps of our approach. While we reserve technical details for the Methods, we believe that this new section provides more intuition about how our results were obtained (including a discussion of the important Eq. 12, now Eq. 3 in the main text) and our underlying assumptions.

      Conclusions:

      Overall, the study presents a compelling analysis of HIV-1 evolution and its parallels in SHIV-infected macaques. While the quantitative comparison between species is a notable contribution, a deeper discussion of its broader implications would strengthen the paper's impact.

      Reviewer #1 (Recommendations for the authors):

      I suggest de-emphasizing bnAbs and focusing on selection landscape inference, which seems to be the actual focus of the paper.

      While we do not directly study antibody development in this work, bnAb development is certainly an important motivating factor. As described in the responses above, we have now modified the Abstract and Discussion to place relatively more emphasis on fitness comparisons and to relatively less focus on bnAb development.  

      Reviewer #2 (Recommendations for the authors):

      Please make sure that the MPL method is defined in this paper and its limitations are at least partially repeated.

      As noted in responses above, we have now included more methodological details in the main text of the paper, which we hope will make the intuition and assumptions involved in our analysis clearer.

      I'd like the code to better show or describe the model, I could not figure out the model details by looking at the code. It seems mostly just to be csv exporting for use with preexisting MPL code. A longer code readme would be helpful.

      We have now updated the README on GitHub to include a conceptual overview of our inference approach, which references how each step is implemented in the code.

      Reviewer #3 (Recommendations for the authors):

      Try to give some more details (not necessarily giving the full mathematical derivation) on the statistical method utilized.

      As noted above, we have now expanded our discussion of the statistical methods and assumptions in the main text.

      Figures 3 and 4 are somewhat 'messy'. Although I do not have a constructive suggestion here, I feel that with a little more effort maybe the authors could come up with something more clean.

      It is true that the mutation frequency dynamics are somewhat “choppy” and difficult to follow intuitively. To attempt to make these figures easier to parse visually, we have increased the transparency on the lines and added exponential smoothing to the mutation frequencies, resulting in smoother trajectories. The trajectories without smoothing are retained in Supplementary Fig. 3. Here we also note that this smoothing is for visual purposes only; we use the original frequency trajectories for inference, rather than the smoothed ones.

    1. The code sheet was organized around three aspects of CSI: (1) crime statistics, forexample types of offenses and demographic details about offenders and victims; (2) crimegenre, for example elements typical of the genre such as the nature of plot development(such as personal involvement narratives); and (3) forensic science, that is, how CSI employsdialogue, narrative, or other programming features to present science.

      This coding system shows how CSI builds its authority, but how much of that “scientific” credibility is just storytelling? Are viewers being educated, or just entertained?

    1. As your roommate receives the message, he decodes your communication and turns it back into thoughts in order to make meaning out of it

      Its incredible, to see how what some one says is a code we undo to make sense out of it, and how fast it takes us to to un code, and this is are way as human to do it and it comes natural. But it is incredible to know other animal have they own way of talking and showing, the is direct in some ways as human.

    1. With growth in the use of communication technology in various aspects of social work practice, social workers need to be aware of the unique challenges that may arise in relation to the maintenance of confidentiality, informed consent, professional boundaries, professional competence, record keeping, and other ethical considerations. In general, all ethical standards in this Code of Ethics are applicable to interactions, relationships, or communications, whether they occur in person or with the use of technology.

      The Code highlights that all standards of the Code are applicable to all dynamics whether online or in person. I would adapt this concept into my practice as it would ensure I am practicing in line with the Code in all of my work. This section does acknowledge that there are unique challenges that may arise from the use of technology, however. I aim to keep apprised of emerging technological developments and challenges to ensure ethical practice in the age of technology. As for social media, my accounts are private so clients cannot follow me.

    2. In situations when conflicting obligations arise, social workers may be faced with complex ethical dilemmas that have no simple answers. Social workers should take into consideration all the values, principles, and standards in this Code that are relevant to any situation in which ethical judgment is warranted. Social workers’ decisions and actions should be consistent with the spirit as well as the letter of this Code.

      While conflict and dilemmas are bound to occur in practice, social workers exercise an amount of power that the client does not have by making these decisions. This power-imbalance highlights the need to meet clients where they are and work collaboratively with them to set goals to put power back into their hands and to not impose our ideas or desires onto them due to our having power.

    3. Ethical decision making is a process. In situations when conflicting obligations arise, social workers may be faced with complex ethical dilemmas that have no simple answers. Social workers should take into consideration all the values, principles, and standards in this Code that are relevant to any situation in which ethical judgment is warranted. Social workers’ decisions and actions should be consistent with the spirit as well as the letter of this Code.

      This would be a perfect of example of out structural inequality or power can be grounds for conflict. Our job as social workers is to be ethical while maintaining the well-being of our clients. I can think of many instances at the sacrifice of ethics, the social worker delivers the client with an alternative source of agency, that is ultimately more effective. So I would beg the question, "Is it ever worth the risk of 'codes' to advocate for a better unconventional solution for a client?" Assuming the solution is effective, would repercussions still be distributed.

    4. The NASW Code of Ethics reflects the commitment of all social workers to uphold the profession’s values and to act ethically. Principles and standards must be applied by individuals of good character who discern moral questions and, in good faith, seek to make reliable ethical judgments.

      At my field placement, I help older adults apply for programs like PAAD or Medicare Savings Programs. Many clients feel overwhelmed when they receive denial letters. This part of the Code of Ethics reminds me that my main role is to serve by breaking down confusing systems and reassuring clients that a denial does not always mean they are out of options.

    5. Alleged violations of the Code would be subject to a peer review process.

      To me, this part is a question of power and can further question structure of inequality. Due to violation cases being peer reviewed, it relies on fellow practioners to be ethical and just. However, there is room for inequality if the power dynamic. For example, If the peer review process is dominated by certain groups (senior practitioners, people of certain racial, gender, or class backgrounds) then marginalized social workers might not get fair treatment.

    6. Social workers pursue social change, particularly with and on behalf of vulnerable and oppressed individuals and groups of people. Social workers’ social change efforts are focused primarily on issues of poverty, unemployment, discrimination, and other forms of social injustice. These activities seek to promote sensitivity to and knowledge about oppression and cultural and ethnic diversity. Social workers strive to ensure access to needed information, services, and resources; equality of opportunity; and meaningful participation in decision making for all people.

      On the detox unit I'm placed on, I see how structural inequities like limited access to behavioral health services, socioeconomic barriers, and stigma around addiction, all directly impact patients’ recovery. This section emphasizes the social worker’s responsibility to advocate for systemic change and challenge power biases. I wish this code offered more guidance for navigating institutional barriers, especially in healthcare settings where policies might limit autonomy for patients with more complex needs.

    7. The Code socializes practitioners new to the field to social work’s mission, values, ethical principles, and ethical standards, and encourages all social workers to engage in self-care, ongoing education, and other activities to ensure their commitment to those same core features of the profession.

      This code, although obviously useful, raised the question of, how the application can be executed. To be more precise, I find that applying self care while trying to meet the needs of the patients is easier said than done. If I were to be confronted by this scenario of needing self care, the immediate answer would be to fulfill the need for personal time. However, this may be hard to do with the demand of work.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      “In their current study, Cummings et al have approached this fundamental biochemical problem using a combination of purified enzyme-substrate reactions, MS/MS, and microscopy in vitro to provide key insights into the hierarchy of generating polyglycylation in cilia and flagella. They first establish that TTLL8 is a monoglycylase, with the potential to add multiple mono glycine residues on both α- and β-tubulin. They then go on to establish that monoglycylation is essential for TTLL10 binding and catalytic activity, which progressively reduces as the level of polyglycylation increases. This provides an interesting mechanism of how the level of polyglycylation is regulated in the absence of a deglycylase. Finally, the authors also establish that for efficient TTLL10 activity, it is not just monoglycylation, but also polyglutamylation that is necessary, giving a key insight into how both these modifications interact with each other to ensure there is a balanced level of PTMs on the axonemes for efficient cilia function.”

      Strengths: 

      The manuscript is well-written, and experiments are succinctly planned and outlined. The experiments were used to provide the conclusions to what the authors were hypothesising and provide some new novel possible mechanistic insights into the whole process of regulation of tubulin glycylation in motile cilia.”

      We thank the reviewer for their support of our study and recognition of its importance to understanding microtubule glycylation and its regulation.  

      “The initial part of the manuscript where the authors discuss about the requirement of monoglycylation by TTLL8 is not new. This was established back in 2009 when Rogowski et al (2009) showed that polyglycylation of tubulin by TTLL10 occurs only when co-expressed in cells with TTLL3 or TTLL8. So, this part of the study adds very little new information to what was known. “

      Our study provides the first in vitro evidence with purified recombinant components that human TTLL8 is exclusively a monoglycylase (Figure 1) and that polyglycylation by TTLL10 requires previous priming with monoglycylation (Figure 2). Studies with purified recombinant components are the gold standard for establishing the activity of an enzyme as cellular work can be obfuscated by the activity of other regulators. We did cite in our original submission the work by Rogowski, Gaertig and Janke from 2009 (reference 15 in the original submission) as well as that Ikegami and Setou 2009 work (reference 26 in the original submission) that established that TTLL10 polygyclylase activity requires co-expression with TTLL8 in cells. Specifically, we stated in our original submission and in the revised manuscript:

      “Cellular overexpression studies coupled with the use of antibodies that recognize mono- and polyglycylation indicate that TTLL8 is also a glycyl-initiase, while TTLL10 a glycyl-elongase (15, 26).  However, direct biochemical evidence with purified enzymes for segregated initiation and elongation activity for glyclases is still lacking as does knowledge of their substrate specificity and regulation.” 

      In addition to citing the Setou study, we now cite again the Rogowski, Gaertig and Janke 2009 study later in the manuscript when the cellular data are mentioned again.  Specifically, we state in the revised manuscript: 

      “This is consistent with cellular overexpression data which showed that polyglycylation signal was detected via antibody only in tubulin from cells that co-expressed TTLL8 and TTLL10, but not TTLL10 alone (15, 26).”

      “The study also fails to discuss the involvement of the other monoglycylase, TTLL3 in the entire study, which is a weakness as in vivo, in cells, both the monoglycylases act in concert and so, may play a role in regulating the activity of TTLL10. “

      We previously showed that purified recombinant TTLL3, like TTLL8, adds only monoglycines, with a preference for the b-tubulin tail (Garnham et al., PNAS 2017). Given that TTLL10 requires priming by monoglycylation, we expect that, similarly to TTLL8, TTLL3 will allow elongation of the initial monoglycyline chains by TTLL10. 

      (1) From the mass spec data, it appears that the Xaenopus Laevis TTLL10 can add up to 18 residues. However, the numbers indicated in Figure 2E seem to suggest that it is a maximum of 23 residues only at a particular position. Does this mean that the 13-18 residues observed are a collection of multiple short-chain polyglycylations or are there positions that the authors observed where there were chains of longer than 3 glycine residues? This would be an interesting point to note as when it was discovered in Paramecium, the polyglycyl chains were reported to be up to 34 residues (Redeker et al., Science 1994). If the authors could test the TTLL10 from Paramecium to observe if this is a consistent phenomenon across evolution or is there a biologically significant difference that is being developed, would be interesting to know.”

      Figure 2E shows a subset of the modified tails that we identified and where the position of the posttranslationally added glycine can be mapped to a specific position, or range of positions. Additional species exist. We note that the mass spectra in Figure 2B are intact LC/MS, while those in Figure 2E are MS/MS. The ionization of tubulin tail peptides with larger number of glycines is not as efficient as for shorter glycine chains, reducing the sensitivity of detection of species that have higher number of glycines. This is not as pronounced when the mass spectra are obtained from the intact protein (Figure 2B). In summary, our data supports the fact that TTLL10 elongates polyglycine chains at multiple positions in the tubulin tail (shown in Figure 2E), however, we cannot ascertain the maximum polyglycine chain length, only the total number of glycyines added.

      Testing the enzyme from Paramecium is an interesting proposal but outside the scope of this manuscript. 

      (2) While it is interesting to know that the TTLL10 binds to TTLL8-modified tubulin with a much higher affinity than unmodified tubulin, in vivo, the microtubules will be a mixture of both TTLL3- and TTLL8-modified tubulin. It would be good to see the binding of the enzyme to a tubulin that is modified by both TTLL3 and TTLL8 if the two have a greater influence on TTLL10 binding.”

      Our previous work showed that purified recombinant TTLL3 has purely monoglycylase activity, with a preference for b-tubulin (Garnham et al., PNAS 2017). The sites of monoglycylation by TTLL3 overlap with those introduced by TTLL8 on b-tubulin (the difference being mainly that TTLL3 is more selective towards b-tubulin and thus has lower activity on a-tubulin). TTLL8 introduces additional monoGlys on the a-tubulin tail. Therefore, it is unlikely that TTLL10 will have a different response to microtubules that carry similar numbers of Gly residues, regardless of whether introduced by TTLL8 or TTLL3 and 8. Our data show that TTLL10 binding increases with Gly number, but that the gains in affinity plateau as the density of glycine residues on the tails increases above a certain threshold, likely because one TTLL10 molecule recognizes one monoGly branch, and steric hindrance on the tubulin tail prevents further recruitment of additional TTLL10 molecules.  

      (3) The authors have always increased the number of monoglycines in beta-tubulin more than in alpha-tubulin. Is there a rationale for this? Since TTLL8 is known to predominantly modify alphatubulin (Rogowski et al., 2009; Gadadhar et al., 2017) why did the authors not check for the increased binding of the TTLL10 on dimers where the number of monoglycines on alpha-tubulin is higher than 1.1? Especially when they themselves observe in their mass spec that even on alphatubulin there are 1, 2, and 3 glycines added. I would like to see what happens if the ratio is high alpha-G + low beta-G”

      As our spectra in Figure 1 show, we find that TTLL8 is able to modify robustly in vitro both a- and b-tubulin but that it shows a slight preference for b-tubulin (Figure 1B). The work from the Janke group that the reviewer is referring to (Rogowski et al., 2009 and Gadahar et al., 2017) did not use recombinant, purified enzymes and unmodified microtubules as substrates and used axonemal tubulin (which carries many modifications), and so it is possible that the a-tubulin preference observed in that system when TTLL8 is overexpressed, is likely to other factors that do not reflect the biochemical property of the enzyme alone (for example, it could be because btubulin site are not available because they are already glutamylated). As can be seen from Figure 3D, the gain in affinity when increasing the number of glycines from one glycine is small, compared to the initial monoglycine added to the a- and the b-tubulin tail, likely reflecting that one tail cannot bind more than one TTLL10 at one time because of steric hindrance. Moreover, it is important here to note that glutamylation and glycylases compete for the same sites on the tubulin tails, as we have for example shown for TTLL3 and TTLL7 (Garnham et al., 2017), therefore the activity of these enzymes in vivo or with non-naïve substrates are context dependent and influences also what sites are available for TTLL10 to modify. In conclusion, by using recombinant enzymes and naïve tubulin we gain insight into the intrinsic property of these enzymes and therefore provide a framework for the interpretation of in vitro and in vivo observations. 

      (4) I wonder why the authors did not use the human TTLL10 to test if this also shows similar binding to the glycylated tubulin despite the fact that it is enzymatically inactive. If it does, then it would be interesting to see the kinetics of binding of this enzyme to see if the fall off of the enzyme from the tubulin is solely driven by the level of polyglycylation only, or if it has any other mechanism involved as well.”

      Work with human recombinant TTLL10, a TTLL10 homolog which was proposed to be inactive, will be an interesting future direction but outside the scope of this manuscript. We did note in our previous manuscript (Garnham et al., 2017, Figure S5) that the residues which are mutated in the human enzyme compared to other mammals are on the dorsal face of the enzyme, far away from the active site, raising an interesting question of how they inactivate the enzyme.   We need however to emphasize that our work clearly shows that it is polyglycylation on the microtubules that reduces binding of TTL10 to microtubules because experiments done in the absence of glycylating activity i.e. with enzyme that was incubated with microtubules that were pre-modified with polyglycline chains, but in the absence of glycyine substrate (precluding any glycylation activity during the binding assay) show that the binding decreases monotonically with the number of polyglycines  on the microtubule (Figures 4A, B).  

      (5) In Figure 5, the authors use monoglycylated tubulin that is either glutamylated or not to show that the activity of TTLL10 is enhanced by the extent of polyglutamylation present on the tubulin. However, there is no evidence of the enzyme binding to microtubules that are only glutamylated. It would be good to test this to determine if the binding is also dependent on both monoglycylation and glutamylation or is it only the enzyme activity.

      Figure 5E shows that TTLL10 binding increases with monoglycylation alone, and that glutamylation is additive and Figures 4A, B show that it is not the enzyme activity that affects the binding, but the glycylation state of the microtubule. We did not determine binding to microtubules that were only glutamylated, because TTLL10 would not be able to elongate polyglycine chains on those microtubules, even if it bound. 

      (6) The level of polyglycylation used in Figure 5 is quite low. It would be good to see how the length of the polyglycine chain impacts TTLL10 activity in the presence of polyglutamylation, and whether this has any cooperative effect leading to longer chain polyglycylation than what is seen with only monoglycylated tubulin.

      We expect longer chain polyglycylation to have an inhibitory effect as we show in Figure 4. 

      “(7) In the overall study, the authors fail to discuss whether the activity of both the glycylases at different sites on tubulin is sequential, or modifications at different residues happen all at once. If the authors were to do a sequential time course of the modification followed by MS/MS analysis, they could get some indications about this.”

      As the data in Figure 3D shows, the effect of adding more monoGly site on a tubulin tail has a muted effect on binding, indicating that the additional mono-Gly branches do not lead to more TTLL10 recruitment because of steric hindrance i.e. multiple TTLL10 enzymes cannot be accommodated on the same tail at the same time efficiently. This is consistent with the overall dimensions of the enzyme and the positions of its active site, which were modeled initially in our previous publication (Garnham et al., PNAS 2017).  The site of TTL10 action is pre-determined by the position of the mono-Gly branch introduced by TTLL3 or TTLL8. The length of the tubulin tail and the proximity of mono-Gly sites to each other precludes TTLL10 acting at multiple positions at once on the same tail.

      “(8) Do the modifications have any cooperative effect with respect to the sites of modification? Does modifying a particular site enhance the kinetics of modification of the other sites? Can the authors test this?”

      This would be an interesting line of future investigations.  

      “Minor points:

      (1’) The authors opine that the level of polyglycylation is regulated by the decreased binding of the TTLL10 to the polyglycylated tubulin. While this is an interesting argument, which could be a possibility based on the data they present, it would still not answer if this is a mechanism followed by TTLL10 of all species or not. If they could test the efficacy of TTLL10 from another species, to see the binding efficiency of that enzyme, it could potentially strengthen their argument of this possible mechanism.”

      The differences between the properties of TTLL10 from different organisms will be an interesting focus of future investigations, but outside the scope of this present study. However, we would like to point out that the level of sequence conservation between TTLL10 makes it unlikely that other TTLL10 do not follow a similar mechanism, albeit with possible differences in the extent of the response.  We also note that we have shown that polyglycylation also inhibits binding to the microtubule of the severing enzyme katanin (Szczesna et al., Dev. Cell 2022). Therefore, these studies suggests that polyglycylation might be a more general mechanism for reducing microtubule binding affinity since glycylation reduces the negative charge on the tubulin tails, which frequently interact with positively charged domains or interfaces in microtubule associated proteins.  

      “(2) The authors indicate that glycylases act on pre-glutamylated microtubules. However, in their assays, they use unmodified tubulin, which I would presume is also not glutamylated. If this is the case, how can they justify that the enzymes prefer pre-glutamylated microtubules? This is a bit unclear. Do they mean that their tubulin is already pre-glutamylated? Have they tested this?”

      The statement regarding the action of these enzymes on glutamylated microtubules refer to the in vivo situation where polyglycylated microtubules appear in cilia biogenesis after the microtubules in the axoneme are already glutamylated. In vitro, by using microtubules that are only monoglycylated and microtubules that are both glutamylated and monoglycylated, we show that glutamylation further increases recruitment of TTLL10 to microtubules that are monoglycyated. Therefore, glutamylated microtubules will be polyglycylated preferentially over those that are not glutamylated. 

      We state: “Axonemal microtubules are abundantly glutamylated. Glutamylation appears during cilia development first, followed by glycylation (12, 13), indicating that in this scenario glycylases act on pre-glutamylated microtubule substrates.”

      “(3) In continuation with the previous point, an immunoblot of their purified tubulin showing no reactivity to anti-glycylation or anti-glutamylation antibodies, which upon treatment with TTLL8 reacts to the anti-glycylation antibody would be confirmatory evidence to show that the isolated tubulin was indeed unmodified.”

      We have now included a Western blot of our TOG-purified tubulin as Figure S3 in our revised manuscript.  This shows a faint signal with the pep-G1 antibody and a very strong signal after TTLL8 treatment. We are not sure whether the low signal with the pep-G1 antibody for the unmodified tubulin is due to low bona fide monoglycylation-specific signal or a low affinity nonspecific interaction of this antibody (raised against mono-glycylated tubulin tail peptides) with the unmodified tubulin. We note that this signal is clearly visible only when loading at least 0.2 micrograms of the purified tubulin. At this loading level the signal for the glycylated species is saturated. It is also important to note that we have not detected glycylated species in this tubulin either by LC-MS or MS/MS. Therefore, our data strongly indicate that the tubulin purified from tsA201 cells is not glycylated or has at most extremely low levels of glycylation. Importantly, this potential trace level of monoglycylated tubulin does not affect any of the conclusions in this study. The Western blot also shows no detectable signal with the polyglycyation antibody in the unmodified tubulin and a very strong, saturated signal after the tubulin was treated with both TTLL8 and TTLL10.  We also added an additional Figure S8 that shows that the tSA201 tubulin does not give a detectable signal for glutamylation. Please see also Figure 3 from Vemu et al., Methods Enzymology 2017 where we also published a Western blot from our TOG-purified tubulin using anti-glutamylation antibodies. 

      “(4) In their study, the authors have used polyglycylation of up to 10-13 residues. This brings me to my first point that in the case of Paramecium, the number was identified to be up to 34, which would mean that this enzyme has higher binding or catalytic activity. I would like to know the authors' perspective on this, as to what could potentially determine the difference in the activities of TTLL10 across species.”

      The Xenopus TTLL10 enzyme can add more glycines than the 10-13 range that we show here if the enzyme is incubated for longer periods. The fact that glycine numbers as high as 34 were detected in Paramecium does not necessarily mean that the Paramecium enzyme is more active since there is no equivalent data to compare it with from Xenopus. The only way to address potential species differences in enzyme specific activity is to purify enzymes from different species and compare their activity side-by-side.  

      (5) How was the completion of the reaction of monoglycylation and polyglycylation determined? If the enzymes were left for more than 20 minutes, did TTLL8/ TTLL10 add more glycines? What is the reason for using less tubulin (1:20 enzyme:tubulin molar ratio) for monoglycylation by TTLL8, and more tubulin (1:50 enzyme:tubulin molar ratio) for polyglycylation by TTLL10?

      Yes, if the enzymes were incubated longer, they added more glycines. The extent of glycylation was determined from the LC-MS and the incubation time was varied to obtain samples with fewer or more glycines.   The lower ratio used for TTLL10 is because of the higher specific activity of that enzyme compared to TTLL8.  

      (6) Figure S2 A, b2 ion is not indicated in the peptide sequence, while it is shown in the m/z graph.

      We thank the reviewer for the careful reading. We have corrected this in our MS/MS spectrum. 

      Reviewer #2 (Public review):

      “In their manuscript, Cummings et al. focus on the enzymatic activities of TTLL3, TTLL8, and TTLL10, which catalyze the glycylation of tubulin, a crucial posttranslational modification for cilia maintenance and motility. The experiments are beautifully performed, with meticulous attention to detail and the inclusion of appropriate controls, ensuring the reliability of the findings. The authors utilized in vitro reconstitution to demonstrate that TTLL8 functions exclusively as a glycyl initiase, adding monoglycines at multiple positions on both α- and β-tubulin tails. In contrast, TTLL10 acts solely as a tubulin glycyl elongase, extending existing glycine chains. A notable finding is the differential substrate recognition between TTLL glycylases and TTLL glutamylases, highlighting a broader substrate promiscuity in glycylases compared to the more selective glutamylases. This observation aligns with the greater diversification observed among glutamylases. The study reveals a hierarchical mechanism of enzyme recruitment to microtubules, where TTLL10 binding necessitates prior monoglycylation by TTLL8. This binding is progressively inhibited by increasing polyglycine chain length, suggesting a self-regulatory mechanism for polyglycine chain length control. Furthermore, TTLL10 recruitment is enhanced by TTLL6mediated polyglutamylation, illustrating a complex interplay between different tubulin modifications. In addition, they uncover that polyglutamylation stimulates TTLL10 recruitment without necessarily increasing glycylation on the same tubulin dimer, due to the potential for TTLLs to interact with neighboring tubulin dimers. This mechanism could lead to an enrichment of glycylation on the same microtubule, contributing to the complexity of the tubulin code. The article also addresses a significant challenge in the field: the difficulty of generating microtubules with controlled posttranslational modifications for in vitro studies. By identifying the specific modification sites and the interplay between TTLL activities, the authors provide a valuable tool for creating differentially glycylated microtubules. This advancement will facilitate further studies on the effects of glycylation on microtubule-associated proteins and the broader implications of the tubulin code. In summary, this study substantially contributes to our knowledge of posttranslational enzymes and their regulation, offering new insights into the biochemical mechanisms underlying microtubule modifications. The rigorous experimental approach and the novel findings presented make this a pivotal addition to the field of cellular and molecular biology.”

      We thank the reviewer for their support of our work.

    1. Despite their rapid improvement, content generated by AI is considered untrustworthy and should not be used as a source for information. Generative AI creates content by analyzing and mimicking whatever content it is supplied with in its code. This means that generative AI can easily inherit bias and is not capable of discerning whether the claims it makes are accurate.

      One misconception about AI that people have is that it is capable of reasoning, that it has intelligence, when that is not true. It doesn’t understand the content that it reads and writes, it simply puts information together from its database, like an advanced version of the autocorrect feature on phone keyboards. It’s tendency to satisfy the user and lack of reasoning makes it very unreliable, but does a great job at appearing to be useful.

    1. (b) Social workers should act to expand choice and opportunity for all people, with special regard for vulnerable, disadvantaged, oppressed, and exploited people and groups

      This reminds us that not everyone starts at the same place. Some people have to face inequalities that limit access and choices. Not everyone has the same options, and many face barriers built into healthcare, housing, or benefits programs. The code of ethics states we should work to expand choice. A question to ask would be how do we confront these institutions that create these inequalities. Another question is, as someone working for a Agency run by the government how do you push for better equality but also not over step?

    1. Instead of processing raw data—the millions of pixels in each video frame—the model works in what’s known as a latent space, in which the video frames (and text prompt) are compressed into a mathematical code that captures just the essential features of the data and throws out the rest.

      Latent diffusion model definition

    1. by the way in the new Code of Laws which I suppose it will be necessary for you to make I desire you would Remember the Ladies, and be more generous and favourable to them than your ancestors. Do not put such unlimited power into the hands of the Husbands. Remember all Men would be tyrants if they could

      Abigail asking John Adams to remember the ladies is important after all in the fight to free America from England everyone fought and while the men fought in battles the ladies were spies and sought information to help the cause

    2. I long to hear that you have declared an independancy—and by the way in the new Code of Laws which I suppose it will be necessary for you to make I desire you would Remember the Ladies,

      Asking him to stop favouring men

    1. This is frankly a really good phishing email. Breaking it down: It greets the user personally with their NPM username. This makes it look personalized, so people are more likely to trust it. People are used to the idea of changing passwords for security. With that in mind, at a glance the idea of changing your two-factor auth credentials "for security reasons" isn't completely unreasonable. NPM has always been kinda weird compared to other open source package repositories, so them requiring something strange like that reads as reasonable. It sets a deadline a few days in the future. This creates a sense of urgency, and when you combine urgency with being rushed by life, you are much more likely to fall for the phishing link. It links to a website (I'm assuming it's on npm.help), and that website is used to get the two-factor credentials somehow and then start publishing new packages with the exploit code.

      What many analyses fail to highlight is that NPM has been sending a lot of really pushy emails about two-factor authentication settings over the last couple years.

      Perversely, this cargo culted best practice led to worse security.

    1. Even though some of the affected versions are currently being removed from npm, some are still available. So please use overrides in your package.json.A malicious package can still be pulled in if another dependency requires a vulnerable version range. Use the overrides feature in your package.json to force a specific, safe version of any package across your entire project.

      Perhaps the most irresponsible thing of the last week is that among possible mitigations, to give priority to exercising even more features of the poorly conceived package.json-based system, especially where it is redundant (but inferior) to a mitigation scheme that consists of checking the given source code revision into the revision control system.

      This omission is especially absurd in relation to stuff like the has-ansi package, which hasn't had a substantial change in years.

    1. Author response:

      Reviewer #1 (Public review): 

      Summary: 

      In this manuscript, the authors theoretically address the topic of interface resistance between a phase-separated condensate and the surrounding dilute phase. In a nutshell, "interface resistance" occurs if material in the dilute phase can only slowly pass through the interface region to enter the dense phase. There is some evidence from FRAP experiments that such a resistance may exist, and if it does, it could be biologically relevant insofar as the movement of material between dense and dilute phases can be rate-limiting for biological processes, including coarsening. The current study theoretically addresses interface resistance at two levels of description: first, the authors present a simple way of formulating interface resistance for a sharp interface model. Second, they derive a formula for interface resistance for a finite-width interface and present two scenarios where the interface resistance might be substantial. 

      Strengths: 

      The topic is of broad relevance to the important field of intracellular phase separation, and the work is overall credible. 

      Weaknesses: 

      There are a few problems with the study as presented - mainly that the key formula for the latter section has already been derived and presented in Reference 6 (notably also in this journal), and that the physical basis for the proposed scenarios leading to a large interface resistance is not clearly supported. 

      (1) As noted, Equation 32 of the current study is entirely equivalent to Equation 8 of Reference 6, with a very similar derivation presented in Appendix 1 of that paper. In fact, Equation 8 in Reference 6 takes one more step by combining Equations 32 and 35 to provide a general expression for the interface resistance in an integral form. These prior results should be properly cited in the current work - the existing citations to Reference 6 do not make this overlap apparent. 

      We agree and will make the overlap explicit, acknowledging priority and clarifying what is new here. The initial version of the preprint of Zhang et al. (2022) (https://www.biorxiv.org/content/10.1101/2022.03.16.484641v1) lacked the derivation (it referenced a Supplementary Note not yet available); it was added during the eLife submission. We worked from the preprint and missed this update, which we will now correct.

      (2) The authors of the current study go on to examine cases where this shared equation (here Equation 32) might imply a large interface resistance. The examples are mathematically correct, but physically unsupported. In order to produce a substantial interface resistance, the current authors have to suppose that in the interface region between the dense and dilute phases, either there is a local minimum of the diffusion coefficient or a local minimum of the density. I am not aware of any realistic model that would produce either of these minima. Indeed, the authors do not present sufficient examples or physical arguments that would support the existence of such minima. 

      We respectfully disagree with the reviewer on the physical plausibility of these scenarios there is both concrete experimental and theoretical evidence for the scenarios we discussed.

      Experimental: Strom et al. (2017) (our reference 11) describes a substantially reduced protein diffusion coefficient at an in vivo phase boundary, while Hahn et al. (2011a) and Hahn et al. (2011b) (our references 27 and 28) describe transient accumulation of molecules at a phase boundary, which they attribute to the Donnan potential, but conceivably a lowered mobility could play a role.

      Theoretical: Recent work (e.g., Majee et al. (2024)) shows that charged layers could form at phase boundaries, which could either repel or attract incoming molecules, depending on their charge, thus altering the local volume fraction, resulting in a trough or peak. Arguably, the model put forth by Zhang et al. (2024) could be mapped to a potential wall, where particles are reflected, unless in a certain state. We will add sentences to the corresponding results section, as well as the discussion to make this plausibility more apparent.

      In my view, these two issues limit the general interest of the latter portion of the current manuscript. While point 1 can be remedied by proper citation, point 2 is not so simple to address. The two ways the authors present to produce a substantial interface resistance seem to me to be mathematical exercises without a physical basis. The manuscript will improve if the authors can provide examples or compelling arguments for a minimum of either diffusion coefficient or density between the dense and dilute phases that would address point 2. 

      We believe we will be able to address both issues.

      Reviewer #2 (Public review): 

      Summary: 

      This work provides a general theoretical framework for understanding molecular transport across liquid-liquid phase boundaries, focusing on interfacial resistance arising from deviations from local equilibrium. By bridging sharp and continuous interface descriptions, the authors demonstrate how distinct microscopic mechanisms can yield similar effective kinetics and propose practical experimental validation strategies. 

      Strengths: 

      (1) Conceptually rich and physically insightful interface resistance formulation in sharp and continuous limits. 

      (2) Strong integration of non-equilibrium thermodynamics with biologically motivated transport scenarios. 

      (3) Thorough numerical and analytical support, with thoughtful connection to current and emerging experimental techniques. 

      (4) Relevance to various systems, including biomolecular condensates and engineered aqueous two-phase systems. 

      Weaknesses: 

      (1) The work remains theoretical, mainly, with limited direct comparison to quantitative experimental data. 

      We agree with the reviewer, an experimental manuscript is in progress.

      (2) The biological implications are only briefly explored; further discussion of specific systems where interface resistance might play a functional role would enhance the impact.

      We thank the reviewer for this comment. We will add several such scenarios to the discussion, including the possibility to use interface resistance as a way of ordering biochemical reactions in time, as well as their potential to exclude molecules from condensates for long time periods, which, while not effective in the long-time limit, could help on cellular timescales of minutes to hours to respond to transient events.

      (3) Some model assumptions (e.g., symmetric labeling or idealized diffusivity profiles) could be further contextualized regarding biological variability. 

      The treatment of labelled and unlabelled molecules as physically identical is well supported by our experiments. Droplets under typical experimental conditions, i.e. when bleaching is not too strong, do not markedly change size or volume fraction of molecules, which would be expected if the physical properties like molecular volume or interaction strength were significantly changed. However, we do agree that in more extreme bleaching regimes the bleach step itself will change the droplet properties, but this can be avoided by tuning the FRAP laser power and dwell times accordingly.

      Our diffusivity profiles are chosen in the simplest possible way to handle typical experimental constraints (large D outside, lower D inside, potentially lowered D at the boundary) and allow for a mean-field treatment. To the best of our knowledge, the precise make-up and concentration profiles of phase boundaries in biomolecular condensates are not currently known, due to limitations in optical resolution.

      Reviewer #3 (Public review): 

      The manuscript investigated the kinetics of molecule transport across interfaces in phase-separated mixtures. Through the development of a theoretical approach for a binary mixture in a sharp interface limit, the authors found that interface resistance leads to a slowdown in interfacial movement. Subsequently, they extended this approach to multiple molecular species (incorporating both labeled and unlabeled molecules) and continuous transport models. Finally, they proposed experimental settings in vitro and commented on the necessary optical resolution to detect signatures of interfacial kinetics associated with resistance. 

      The investigation of transport kinetics across biomolecular condensate interfaces holds significant relevance for understanding cellular function and dysfunction mechanisms; thus, the topic is important and timely. However, the current manuscript presentation requires improvement. Firstly, the inclusion of numerous equations in the main text substantially compromises readability, and relocation of a part of the formulae and derivations to the Appendix would be more appropriate. Secondly, the manuscript would benefit from more comprehensive comparisons with existing theoretical studies on molecular transport kinetics. The text should also be written to be more approachable for a general readership. Modifications and sufficient responses to the specific points outlined below are recommended. 

      (1) The authors introduced a theoretical framework to study the kinetics of molecules across an interface between two coexisting liquid phases and found that interface resistance leads to a slowdown in interfacial movement in a binary mixture and a decelerated molecule exchange between labeled and unlabeled molecules across the phase boundary. However, these findings appear rather expected. The work would be strengthened by a more thorough discussion of the kinetics of molecule transport across interfaces (such as the physical origin of the interface resistance and its specific impact on transport kinetics). 

      We thank the reviewer for this comment and will discuss possible mechanisms and how they map to our meanfield model in more detail, both in the corresponding results section, and in the discussion, as also outlined in our response to Reviewer #1.

      (2) The formulae in the manuscript should be checked and corrected. Notably, Equation 10 contains "\phi_2\ln\phi_2" while Eq. 11b shows "n^{-1}\ln\phi_2", suggesting a missing factor of "n^{-1}". Similarly, Equation 18 obtained from Equation 11: the logarithmic term in Eq.11a is "n<sup>^</sup>{-1}\ln phi_1-\ln(1-\phi)" but the pre-exponential factor in Equation 18a is just "\phi_1/(1-\phi*)", where is "n<sup>^</sup>{-1}"? Additionally, there is a unit inconsistency in Equation 36, where the unit of \rho (s/m) does not match that of the right-hand side expression (s/m<sup>^</sup>2). 

      We thank the reviewer. We identified that the error originates in the inline definition of the exchange chemical potential, already before equation 11. We inadvertently dropped a prefactor of n, which then shows up in the following equation as an exponent to (1-phi<sup>^</sup>*). Very importantly this means the main result eq. 25 still holds, and in the revised manuscript we will correct the ensuing typographical mistakes.

      (3) The authors stated that the numerical solutions are obtained using a custom finite difference scheme implemented in MATLAB in the Appendix. The description of numerical methods is insufficiently detailed and needs to be expanded, including specific equations or models used to obtain specific figures, the introduction of initial and boundary conditions, the choices of parameters and their reasons in terms of the biology.

      We will substantially expand the Appendix for the numerical solutions and add an explanatory file to the repository to make clear how the code can be run, as well as its dependencies.

      (4) The authors claimed that their framework naturally extends to multiple molecular species, but only showed the situation of labeled and unlabeled molecules across a phase boundary. How about three or more molecular species? Does this framework still work? This should be added to strengthen the manuscript and confirm the framework's general applicability. 

      We have shown in Bo et al. (2021) that the labelling approach can be carried over to multi-component systems. Each species may, for example, encounter its own interface resistance. We will discuss this in more detail in the revised manuscript.

    1. In fulfillment of the obligation to the student, the educator shall not unreasonably deny the student access to varying points of view

      So in this scenario, how does the Code of Ethics come into play if a teacher is forced to intentionally go against their duty of informing students about varying points of view? Can they fight it by using the Code as a guide to what they should and should not be able to teach? How does that work?

    2. Dress codes have been challenged by students and teachers alike as a form of freedom of speech and expression.

      I've been seeing a lot of discourse surrounding teacher outfits/dress code on social media lately, which is interesting because in the past, it has almost always been centered on students.

    3. In fulfillment of the obligation to the student, the educator shall not unreasonably deny the student access to varying points of view

      It's absolutely wild to me that the current administration actively acts against standards like this in a code of ethics that should honestly be common knowledge.

    4. As you gathered from this activity, there is not always one right “answer” to any given situation. A Code of Ethics provides moral standards to help guide your decision making and teaching practice. It helps with what you should do. It does not provide specific directions on what to do or even how to do it.

      One thing that worries me is that the code of ethics posted by a school might differ with my values a little too much. I know that working at a school is a hard job that not many can do. A teacher should do their best, and I believe that they should follow their heart when making decisions when there is no confliction with the law or a student's well being.

    5. The educator recognizes the magnitude of the responsibility inherent in the teaching process. The desire for the respect and confidence of one’s colleagues, of students, of parents, and of the members of the community provides the incentive to attain and maintain the highest possible degree of ethical conduct (NEA, Code of Ethics, 2019).

      The teacher looks to better themselves because they want the respect from those that count or work with them. If a professional remains complacent then they won't advance in their careers. Their progress should be measured by a standard not their own personal goals though.

    6. As a public school teacher, can you exercise your own ‘personal liberty’ in how you dress?

      I feel that teachers should be able to wear what they want to wear as long as they're covered or aren't wearing super-tight attire. If students are allowed to express themselves, then they should have the same right. I believe the only time this could be challenged is if you're working at a private school where they do have a dress code that you need to follow.

    7. In fulfillment of the obligation to the student, the educator shall not unreasonably deny the student access to varying points of view

      This was so interesting to see. I thought back on my teachers and, whether intentional or not, I felt sometimes that their point of view and others similar to their own were the only ones being discussed. So hearing this part of the code of ethics for this was interesting to hear.

    8. The educator strives to help each student realize his or her potential as a worthy and effective member of society.

      This is, I feel, the primary goal of most teachers so it's interesting to see it highlighted in such an official code of ethics.

    9. The educator recognizes the magnitude of the responsibility inherent in the teaching process. The desire for the respect and confidence of one’s colleagues, of students, of parents, and of the members of the community provides the incentive to attain and maintain the highest possible degree of ethical conduct (NEA, Code of Ethics, 2019).

      I partially agree with this and partially don't. I think the first part is true, that most teachers that go into teaching have a deep respect and understanding of the importance of teacher's job. But I don't overly think that most teachers ethics come from the desire for respect from their students/coworkers/parents/etc.

    10. It helps with what you should do. It does not provide specific directions on what to do or even how to do it.

      This is gold to me. Every student learns different and takes in info at their own speed. Every teacher has a different way of teacher things, even if following a curriculum or Code of Ethics.

  5. resu-bot-bucket.s3.ca-central-1.amazonaws.com resu-bot-bucket.s3.ca-central-1.amazonaws.com
    1. Revised, modularized, and updated old assembly program to a modern code base removing 22 detected bugs enabling future feature implementation.

      Explain how bug removal improved functionality or user experience. Provide examples of features enabled.

    1. Author response:

      Reviewer #1, Comment (1): Terminology

      We fully acknowledge the importance of terminological consistency and will align our usage with established literature. Specifically, we will revise as follows, 

      (1) Replace “sinusoidal analysis” with either “sinusoidal modulation” (Doeller et al., 2010; Bao et al., 2019; Raithel et al., 2023) or “GLM with sinusoidal (cos/sin) regressors” (Constantinescu et al., 2016). 

      (2) Replace “1D directional domain” with either “angular domain of movement directions (0–360°)” or “directional modulation analysis”.

      Reviewer #1, Comment (2): Spectral analysis and 3-fold periodicity

      We agree that the presentation of our spectral analysis and the theoretical motivation underlying our expectation of a three-fold periodicity within hippocampal data requires further clarification.

      In our revised manuscript, we will:<br /> (1) Clearly articulate the theoretical motivation for anticipating a three-fold signal, explicitly linking it to the known hexagonal grid structure encoded by the entorhinal cortex.

      (2) Clarify our methodological rationale for using Fourier analysis (FFT).

      a) FFT allows unbiased exploration of multiple candidate periodicities (e.g., 3–7-fold) without predefined assumptions.

      b) FFT results cross-validate our sinusoidal modulation results, providing complementary evidence supporting the 6-fold periodicity in EC and 3-fold periodicity in HPC.

      c) FFT uniquely facilitates analysis of periodicities in behavioral performance data, which is not feasible via standard sinusoidal GLM approaches. This consistency allows us to directly compare periodicities across neural and behavioral data.

      (3) Further, we will expand our discussion to provide:

      a) A deeper interpretation of potential biological bases for the observed hippocampal three-fold periodicity.

      b) A careful examination of alternative explanations within existing hippocampal modeling frameworks.

      Reference:

      Doeller, C. F., Barry, C., & Burgess, N. (2010). Evidence for grid cells in a human memory network. Nature, 463(7281), 657-661.

      Constantinescu, A. O., O'Reilly, J. X., & Behrens, T. E. J. (2016). Organizing conceptual knowledge in humans with a gridlike code. Science, 352(6292), 1464-1468.

      Bao, X., Gjorgieva, E., Shanahan, L. K., Howard, J. D., Kahnt, T., & Gottfried, J. A. (2019). Grid-like neural representations support olfactory navigation of a two-dimensional odor space. Neuron, 102(5), 1066-1075.

      Raithel, C. U., Miller, A. J., Epstein, R. A., Kahnt, T., & Gottfried, J. A. (2023). Recruitment of grid-like responses in human entorhinal and piriform cortices by odor landmark-based navigation. Current Biology, 33(17), 3561-3570

    1. “If I were to go hire a consultant to help me figure out how to use Gemini CLI or Claude Code, you’re going to find a partner at one of the Big Four has no more or less experience than a kid in college who tried to use it,” he said, referring to generative AI tools from Google and Anthropic.Advertisement

      This is a fair point. It’s not that AI is necessarily causing consultants to lose their jobs consultants are still needed because they bring a level of creativity that AI currently lacks. However, since AI is such a new and rapidly developing field, consultants don’t necessarily have more experience implementing it than the employees already working within companies.

    1. Reviewer #1 (Public review):

      Summary:

      Participants learned a graph-based representation, but, contrary to the hypotheses, failed to show neural replay shortly after. This prompted a critical inquiry into temporally delayed linear modeling (TDLM)--the algorithm used to find replay. First, it was found that TDLM detects replay only at implausible numbers of replay events per second. Second, it detects replay-to-cognition correlations only at implausible densities. Third, there are concerning baseline shifts in sequenceness across participants. Fourth, spurious sequences arise in control conditions without a ground truth signal. Fifth, when reframing simulations previously published, similar evidence is apparent.

      Strengths:

      (1) This work is meticulous and meets a high standard of transparency and open science, with preregistration, code and data sharing, external resources such as a GUI with the task and material for the public.

      (2) The writing is clear, balanced, and matter-of-fact.

      (3) By injecting visually evoked empirical data into the simulation, many surface-level problems are avoided, such as biological plausibility and questions of signal-to-noise ratio.

      (4) The investigation of sequenceness-to-cognition correlations is an especially useful add-on because much of the previous work uses this to make key claims about replay as a mechanism.

      Weaknesses:

      Many of the weaknesses are not so much flaws in the analyses, but shortcomings when it comes to interpretation and a lack of making these findings as useful as they could be.

      (1) I found the bigger picture analysis to be lacking. Let us take stock: in other work, during active cognition, including at least one study from the Authors, TDLM shows significance sequenceness. But the evidence provided here suggests that even very strong localizer patterns injected into the data cannot be detected as replay except at implausible speeds. How can both of these things be true? Assuming these analyses are cogent, do these findings not imply something more destructive about all studies that found positive results with TDLM?

      (2) All things considered, TDLM seems like a fairly 'vanilla' and low-assumption algorithm for finding event sequences. It is hard to see intuitively what the breaking factor might be; why do the authors think ground truth patterns cannot be detected by this GLM-based framework at reasonable densities?

      (3) Can the authors sketch any directions for alternative methods? It seems we need an algorithm that outperforms TDLM, but not many clues or speculations are given as to what that might look like. Relatedly, no technical or "internal" critique is provided. What is it about TDLM that causes it to be so weak?

      Addressing these points would make this manuscript more useful, workable, and constructive, even if they would not necessarily increase its scientific breadth or strength of evidence.

    2. Reviewer #2 (Public review):

      Summary:

      Kern et al. investigated whether temporally delayed linear modeling (TDLM) can uncover sequential memory replay from a graph-learning task in human MEG during an 8-minute post-learning rest period. After failing to detect replay events, they conduct a simulation study in which they insert synthetic replay events, derived from each participant's localizer data, into a control rest period prior to learning. The simulations suggest that TDLM only reveals sequences when replay occurs at very high densities (> 80 per minute) and that individual differences in baseline sequenceness may lead to spurious and/or lackluster correlations between replay strength and behavior.

      Strengths:

      The approach is extremely well documented and rigorous. The authors have done an excellent job re-creating the TDLM methodology that is most commonly used, reporting the different approaches and parameters that they used, and reporting their preregistrations. The hybrid simulation study is creative and provides a new way to assess the efficacy of replay decoding methods. The authors remain measured in the scope/applicability of their conclusions, constructive in their discussion, and end with a useful set of recommendations for how to best apply TDLM in future studies. I also want to commend this work for not only presenting a null result but thoroughly exploring the conditions under which such a null result is expected. I think this paper is interesting and will be generally quite useful for the field, but I believe it also has a number of weaknesses that, if addressed, could improve it further.

      Weaknesses:

      The sample size is small (n=21, after exclusions), even for TDLM studies (which typically have somewhere between 25-40 participants). The authors address this somewhat through a power analysis of the relationship between replay and behavioral performance in their simulations, but this is very dependent on the assumptions of the simulation. Further, according to their own power analysis, the replay-behavior correlations are seriously underpowered (~10% power according to Figure 7C), and so if this is to be taken at face value, their own null findings on this point (Figure 3C) could therefore just reflect undersampling as opposed to methodological failure. I think this point needs to be made more clearly earlier in the manuscript. Relatedly, it would be very useful if one of the recommendations that come out of the simulations in this paper was a power analysis for detecting sequenceness in general, as I suspect that the small sample size impacts this as well, given that sequenceness effects reported in other work are often small with larger sample sizes. Further, I believe that the authors' simulations of basic sequenceness effects would themselves still suffer from having a small number of subjects, thereby impacting statistical power. Perhaps the authors can perform a similar sort of bootstrapping analysis as they perform for the correlation between replay and performance, but over sequenceness itself?

      The task paradigm may introduce issues in detecting replay that are separate from TDLM. First, the localizer task involves a match/mismatch judgment and a button press during the stimulus presentation, which could add noise to classifier training separate from the semantic/visual processing of the stimulus. This localizer is similar to others that have been used in TDLM studies, but notably in other studies (e.g., Liu, Mattar et al., 2021), the stimulus is presented prior to the match/mismatch judgment. A discussion of variations in different localizers and what seems to work best for decoding would be useful to include in the recommendations section of the discussion. Second, and more seriously, I believe that the task design for training participants about the expected sequences may complicate sequence decoding. Specifically, this is because two images (a "tuple") are shown together and used for prediction, which may encourage participants to develop a single bound representation of the tuple that then predicts a third image (AB -> C rather than A -> B, B -> C). This would obviously make it difficult to i) use a classifier trained on individual images to detect sequences and ii) find evidence for the intended transition matrix using TDLM. Can the authors rule out this possibility?

      Participants only modestly improved (from 76-82% accuracy) following the rest period (which the authors refer to as a consolidation period). If the authors assume that replay leads to improved performance, then this suggests there is little reason to see much task-related replay during rest in the first place. This limitation is touched on (lines 228-229), but I think it makes the lack of replay finding here less surprising. However, note that in the supplement, it is shown that the amount of forward sequenceness is marginally related to the performance difference between the last block of training and retrieval, and this is the effect I would probably predict would be most likely to appear. Obviously, my sample size concerns still hold, and this is not a significant effect based on the null hypothesis testing framework the authors employ, but I think this set of results should at least be reported in the main text. I was also wondering whether the authors could clarify how the criterion over six blocks was 80% but then the performance baseline they use from the last block is 76%? Is it just that participants must reach 80% within the six blocks *at some point* during training, but that they could dip below that again later?

      Because most of the conclusions come from the simulation study, there are a few decisions about the simulations that I would like the authors to expand upon before I can fully support their interpretations. First, the authors use a state-to-state lag of 80ms and do not appear to vary this throughout the simulations - can the authors provide context for this choice? Does varying this lag matter at all for the results (i.e., does the noise structure of the data interact with this lag in any way?) Second, it seems that the approach to scaling simulated replays with performance is rather coarse. I think a more sensitive measure would be to scale sequence replays based on the participants' responses to *that* specific sequence rather than altering the frequency of all replays by overall memory performance. I think this would help to deliver on the authors' goal of simulating an "increase of replay for less stable memories" (line 246). On the other hand, I was also wondering whether it is actually necessary to use the real memory performance for each participant in these simulations - couldn't similar goals (with a better/more full sampling of the space of performance) be achieved with simulated memory performance as well, taking only the MEG data from the participant? Finally, Figure 7D shows that 70ms was used on the y-axis. Why was this the case, or is this a typo?

      Because this is a re-analysis of a previous dataset combined with a new simulation study on that data aimed at making recommendations about how to best employ TDLM, I think the usefulness of the paper to the field could be improved in a few places. Specifically, in the discussion/recommendation section, the authors state that "yet unknown confounders" (line 295) lead to non-random fluctuations in the simulated correlations between replay detection and performance at different time lags. Because it is a particularly strong claim that there is the potential to detect sequenceness in the baseline condition where there are no ground-truth sequences, the manuscript could benefit from a more thorough exploration of the cause(s) of this bias in addition to the speculation provided in the current version. In addition, to really provide that a realistic simulation is necessary (one of the primary conclusions of the paper), it would be useful to provide a comparison to a fully synthetic simulation performed on this exact task and transition structure (in addition to the recreation of the original simulation code from the TDLM methods paper). Finally, I think the authors could do further work to determine whether some of their recommendations for improving the sensitivity of TDLM pan out in the current data - for example, they could report focusing not just on the peak decoding timepoint but incorporating other moments into classifier training.

      Lastly, I would like the authors to address a point that was raised in a separate public forum by an author of the TDLM method, which is that when replays "happen during rest, they are not uniform or close". Because the simulations in this work assume regularly occurring replay events, I agree that this is an important limitation that should be incorporated into alternative simulations to ensure the lack of findings is not because of this assumption.

    1. We cannot hand code a program that exhaustively enumerates all the relevant factors that allow us to recognize objects from every possible perspective or in all their potential visual configurations.

      The need for "learning" over "branching"

    1. Solutions For Individuals Overview Google Workspace Individual For Business Overview Google Workspace Business Small Business Small business productivity tools New Business Tools for new businesses Startups Startup productivity tools For Enterprise Overview Google Workspace Enterprise Frontline Workers Google Workspace for the frontline Work Safer Protect organizations from cyberattacks Developers Education Nonprofits close Products Gmail Custom business email Drive Cloud storage Meet Video conferencing Chat Messaging for teams Calendar Shared calendars Tasks Tasks and task lists Docs Word processing Sheets Spreadsheets Slides Presentation builder Forms Online forms and surveys Sites Team and project sites Gemini app AI assistant NotebookLM AI research assistant Vids Video editor Keep Digital notes AppSheet No-code apps and automations AI Solutions Security Admin console Add-ons See more apps close Industries Industries Industries Healthcare and Life Sciences Retail Manufacturing Government and Public Sector Professional Services Technology Departments Sales Marketing Human Resources Security close AI Pricing Resources Resources See more Discover Security and trust Keep your data safe and compliant Blog Latest product news and stories Customer stories Case studies and videos Learn FAQs Answers to commonly asked questions Training and certification On-demand or classroom training Live and on-demand events Explore events and webinars Video conferencing Learn about Google Meet Connect Partners Find the right partner Marketplace Browse and install apps Integrations Partner and custom integrations Refer Google Workspace Earn rewards with our Referral Program Support for admins Support for users close

      This dropdown feature is this. While you can hover over it to look at the menu options, you can also click on it which is good for someone who can use the mouse or scroll much.

    1. max-1-branch and the todo list

      !!!! в примере промпта они пишут что все-таки ему можно их параллельно вызывать

      Launch multiple agents concurrently whenever possible, to maximize performance; to do that, use a single message with multiple tool uses


      WTF для приветсвенной шутки?

      <example_agent_descriptions> "code-reviewer": use this agent after you are done writing a signficant piece of code

      "greeting-responder": use this agent when to respond to user greetings with a friendly joke </example_agent_description>

    2. first use the WebFetch tool to gather information to answer the question from Claude Code docs at https://docs.anthropic.com/en/docs/claude-code.

      Мб нам такое же для вопросов по банку сделать?

    3. You should NOT answer with unnecessary preamble or postamble (such as explaining your code or summarizing your action)

      квен3 плохо слушался

    4. You should NOT answer with unnecessary preamble or postamble (such as explaining your code or summarizing your action)

      квены ОЧЕНЬ любят повторять то, что получилось в tool_result, под каждый добавляем специальную константу-инструкцию


      когда просто в промпт писали -- работало менее надежно

    1. The empires of the 21st century don’t need the Dutch East Company, or soldiers, or muskets, orsmallpox. They operate through code, unfair contracts and VC prospectus. Where European powersonce laid claim to land, labour and resources, AI companies now lay claim to language, culture andmemory

      This analogy kind of reminds me of the analogy used in O'Neil's piece that companies view AI as an arms race, and I feel like the competition between companies with no regard for the consequences is reflected here

    1. (a) Social workers should take adequate measures to discourage, prevent, expose, and correct the unethical conduct of colleagues, including unethical conduct using technology.

      Allen Barsky believes that social workers should avoid communicating online with groups that involve political affiliations or any type of political cause. This is a practice that I will abide by to avoid unethical conduct using technology. If I have colleagues on social media that are violating this code of conduct, I will remind them about the adequate measures we must take as social workers to prevent the unethical conduct of colleagues, including unethical conduct using technology.

    1. First, this is another Chromium-based browser. Second, Chromium/Chrome-based browsers are awful at portablization. Passwords are locked to a single PC, extensions routinely get lost, it doesn't fully work from Unicode paths, etc. They're only just barely kinda portable and only held together with duct tape. And this is entirely due to the Chrome/Chromium code underneath. Firefox, for instance, is *wonderfully* portable and, in terms of portability, is a Ferrari compared to Chrome/Chromium's Yugo. Third, Brave's speed improvement claims are vs a Chromium browser without adblocking. If you use Chrome with uBlock or AdBlockPlus, you'll get the same performance as Brave which negates its only current advantage since the publisher-based micropayment system isn't a thing yet.
      • !!!
    1. In this network of collaboration,

      Understand other groups’ membership criteria • Enforce a code of conduct • Resolve conflicts between organizations • Provide services to members • Read other membership cards without tech integration Network State Onlin e commun ity Activist group Intentiona l commun ity DAO Statele ss pe ople Values- alig ned group Service provider

    2. Challenge: Cross-organizational Trust
      • Understand other groups’ membership criteria
      • Enforce a code of conduct
      • Resolve conflicts between organizations
      • Provide services to members
      • Read other membership cards without tech integration

      Network State Onlin e commun ity Activist group Intentiona l commun ity DAO Statele ss pe ople Values- alig ned group Service provider

    3. a sharing economy

      Not a challenge within your organization - Set membership criteria - Enforce a code of conduct - Resolve internal conflicts -Provide services to your members - Issue membership cards / NFTs / digital ID

    1. dplyr

      It is not at all a bad idea to introduce dplyr here. data.table would be an alternative and often more efficient on large datasets, but the code is not as readyble

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary

      This paper summarises responses from a survey completed by around 5,000 academics on their manuscript submission behaviours. The authors find several interesting stylised facts, including (but not limited to):

      Women are less likely to submit their papers to highly influential journals (e.g., Nature, Science and PNAS).

      Women are more likely to cite the demands of co-authors as a reason why they didn't submit to highly influential journals.

      Women are also more likely to say that they were advised not to submit to highly influential journals.

      The paper highlights an important point, namely that the submission behaviours of men and women scientists may not be the same (either due to preferences that vary by gender, selection effects that arise earlier in scientists' careers or social factors that affect men and women differently and also influence submission patterns). As a result, simply observing gender differences in acceptance rates - or a lack thereof - should not be automatically interpreted as as evidence for or against discrimination (broadly defined) in the peer review process.

      Major comments

      What do you mean by bias?

      In the second paragraph of the introduction, it is claimed that "if no biases were present in the case of peer review, then we should expect the rate with which members of less powerful social groups enjoy successful peer review outcomes to be proportionate to their representation in submission rates." There are a couple of issues with this statement.

      First, the authors are implicitly making a normative assumption that manuscript submission and acceptance rates *should* be equalised across groups. This may very well be the case, but there can also be valid reasons - even when women are not intrinsically better at research than men - why a greater fraction of female-authored submissions are accepted relative to male-authored submissions (or vice versa). For example, if men are more likely to submit their less ground-breaking work, then one might reasonably expect that they experience higher rejection rates compared to women, conditional on submission.

      We do assume that normative statement: unless we believe that men’s papers are intrinsically better than women’s papers, the acceptance rate should be the same. But the referee is right: we have no way of controlling for the intrinsic quality of the work of men and women. That said, our manuscript does not show that there is a different acceptance rate for men and women; it shows that women are less likely to submit papers to a subset of journals that are of a lower Journal Impact Factor, controlling for their most cited paper, in an attempt to control for intrinsic quality of the manuscripts.

      Second, I assume by "bias", the authors are taking a broad definition, i.e., they are not only including factors that specifically relate to gender but also factors that are themselves independent of gender but nevertheless disproportionately are associated with one gender or another (e.g., perhaps women are more likely to write on certain topics and those topics are rated more poorly by (more prevalent) male referees; alternatively, referees may be more likely to accept articles by authors they've met before, most referees are men and men are more likely to have met a given author if he's male instead of female). If that is the case, I would define more clearly what you mean by bias. (And if that isn't the case, then I would encourage the authors to consider a broader definition of "bias"!)

      Yes, the referee is right that we are taking a broad definition of bias. We provide a definition of bias on page 3, line 92. This definition is focused on differential evaluation which leads to differential outcomes. We also hedge our conversation (e.g., page 3, line 104) to acknowledge that observations of disparities may only be an indicator of potential bias, as many other things could explain the disparity. In short, disparities are a necessary but insufficient indicator of bias. We add a line in the introduction to reinforce this. The only other reference to the term bias comes on page 10, line 276. We add a reference to Lee here to contextualize.

      Identifying policy interventions is not a major contribution of this paper

      I would take out the final sentence in the abstract. In my opinion, your survey evidence isn't really strong enough to support definitive policy interventions to address the issue and, indeed, providing policy advice is not a major - or even minor - contribution of your paper. (Basically, I would hope that someone interested in policy interventions would consult another paper that much more thoughtfully and comprehensively discusses the costs and benefits of various interventions!) While it's fine to briefly discuss them at the end of your paper - as you currently do - I wouldn't highlight that in the abstract as being an important contribution of your paper.

      We thank the referee for this comment. While we agree that our results do not lead to definitive policy interventions, we believe that our findings point to a phenomenon that should be addressed through policy interventions. Given that some interventions are proposed in our conclusion, we feel like stating this in the abstract is coherent.

      Minor comments

      What is the rationale for conditioning on academic rank and does this have explanatory power on its own - i.e., does it at least superficially potentially explain part of the gender gap in intention to submit?

      Thank you for this thoughtful question. We conditioned on academic rank in all regression analyses to account for structural differences in career stage that may potentially influence submission behaviors. Academic rank (e.g., assistant, associate, full professor) is a key determinant of publishing capacity and strategic considerations, such as perceived likelihood of success at elite journals, tolerance for risk, and institutional expectations for publication venues.

      Importantly, academic rank is also correlated with gender due to cumulative career disadvantages that contribute to underrepresentation of women at more senior levels. Failing to adjust for rank would conflate gender effects with differences attributable to career stage. By including rank as a covariate, we aim to isolate gender-associated patterns in submission behavior within comparable career stages, thereby producing a more precise estimate of the gender effect.

      Regarding explanatory power, academic rank does indeed contribute significantly to model fit across our analyses, indicating that it captures meaningful variation in submission behavior. However, even after adjusting for rank, we continue to observe significant gender differences in submission patterns in several disciplines. This suggests that while academic rank explains part of the variation, it does not fully account for the gender gap—highlighting the importance of examining other structural and behavioral factors that shape the publication trajectory.

      Reviewer #2 (Public review):

      Basson et al. present compelling evidence supporting a gender disparity in article submission to "elite" journals. Most notably, they found that women were more likely to avoid submitting to one of these journals based on advice from a colleague/mentor. Overall, this work is an important addition to the study of gender disparities in the publishing process.

      I thank the authors for addressing my concerns.

      Reviewer #4 (Public review):

      Main strengths

      The topic of the MS is very relevant given that across the sciences/academia, genders are unevenly represented, which has a range of potential negative consequences. To change this, we need to have the evidence on what mechanisms cause this pattern. Given that promotion and merit in academia are still largely based on the number of publications and the impact factor, one part of the gap likely originates from differences in publication rates of women compared to men.

      Women are underrepresented compared to men in journals with a high impact factor. While previous work has detected this gap and identified some potential mechanisms, the current MS provides strong evidence that this gap might be due to a lower submission rate of women compared to men, rather than the rejection rates. These results are based on a survey of close to 5000 authors. The survey seems to be conducted well (though I am not an expert in surveys), and data analysis is appropriate to address the main research aims. It was impossible to check the original data because of the privacy concerns.

      Interestingly, the results show no gender bias in rejection rates (desk rejection or overall) in three high-impact journals (Science, Nature, PNAS). However, submission rates are lower for women compared to men, indicating that gender biases might act through this pathway. The survey also showed that women are more likely to rate their work as not groundbreaking and are advised not to submit to prestigious journals, indicating that both intrinsic and extrinsic factors shape women's submission behaviour.

      With these results, the MS has the potential to inform actions to reduce gender bias in publishing, but also to inform assessment reform at a larger scale.

      I do not find any major weaknesses in the revised manuscript.

      Reviewer #4 (Recommendations for the authors):

      (1) Colour schemes of the Figures are not adjusted for colour-blindness (red-green is a big NO), some suggestions can be found here https://www.nceas.ucsb.edu/sites/default/files/2022-06/Colorblind%20Safe%20Color%20Schemes.pdf

      We appreciate the suggestion. We’ve adjusted the colors in the manuscript to be color-blind friendly using one of the colorblind safe palettes suggested by the reviewer.

      (2) I do not think that the authors have fully addressed the comment about APCs and the decision to submit, given that PNAS has publication charges that amount to double of someone's monthly salary. I would add a sentence or two to explain that publication charges should not be a factor for Nature and Science, but might be for PNAS.

      While APCs are definitely a factor affecting researchers’ submission behavior, it is mostly does so for lower prestige journals rather than for the three elite journals analyzed here. As mentioned in the previous round of revisions, Nature and Science have subscription options. And PNAS authors without funding have access to waivers: https://www.pnas.org/author-center/publication-charges

      (3) Line 268, the first suggestion here is not something that would likely work. Thus, I would not put it as the first suggestion.

      We made the suggested change.

      (4) Data availability - remove AND in 'Aggregated and de-identified data' because it sounds like both are shared. Suggest writing: 'Aggregated, de-identified data..'. I still suggest sharing data/code in a trusted repository (e.g. Dryad, ZENODO...) rather than on GitHub, as per the current recommendation on the best practices for data sharing.

      Thank you for your comment regarding data availability. Due to IRB restrictions and the conditions of our ethics approval, we are not permitted to share the survey data used in this study. However, to support transparency and reproducibility, we have made all analysis code available on Zenodo at https://doi.org/10.5281/zenodo.16327580. In addition, we have included a synthetic dataset with the same structure as the original survey data but containing randomly generated values. This allows others to understand the data structure and replicate our analysis pipeline without compromising participant confidentiality.

    1. Reviewer #1 (Public review):

      Summary:

      Chao et al. produced an updated version of the SpliceAI package using modern deep learning frameworks. This includes data preprocessing, model training, direct prediction, and variant effect prediction scripts. They also added functionality for model fine-tuning and model calibration. They convincingly evaluate their newly trained models against those from the original SpliceAI package and investigate how to extend SpliceAI to make predictions in new species. Their comparisons to the original SpliceAI models are convincing on the grounds of model performance and their evaluation of how well the new models match the original's understanding of non-local mutation effects. However, their evaluation of the new calibration functionality would benefit from a more nuanced discussion of the limitations of calibration.

      Strengths

      (1) They provide convincing evidence that their new implementation of SpliceAI matches the performance and mutation effect estimation capabilities of the original model on a similar dataset while benefiting from improved computational efficiencies. This will enable faster prediction and retraining of splicing models for new species as well as easier integration with other modern deep learning tools.

      (2) They produce models with strong performance on non-human model species and a simple well well-documented pipeline for producing models tuned for any species of interest. This will be a boon for researchers working on splicing in these species and make it easy for researchers working on new species to generate their own models.

      (3) Their documentation is clear and abundant. This will greatly aid the ability of others to work with their code base.

      Weaknesses

      (1) Their discussion of their package's calibration functionality does not adequately acknowledge the limitations of model calibration. This is problematic as this is a package intended for general use and users who are not experienced in modeling broadly and the subfield of model calibration specifically may not already understand these limitations. This could lead to serious errors and misunderstandings down the road. A model is not calibrated or uncalibrated in and of itself, only with respect to a specific dataset. In this case they calibrated with respect to the training dataset, a set of canonical transcript annotations. This is a perfectly valid and reasonable dataset to calibrate against. However, this is unlikely to be the dataset the model is applied to in any downstream use case, and this calibration is not guaranteed or expected to hold for any shift in the dataset distribution. For example, in the next section they use ISM based approaches to evaluate which sequence elements the model is sensitive to and their calibration would not be expected to hold for this set of predictions. This issue is particularly worrying in the case of their model because annotation of canonical transcript splice sites is a task that it is unlikely their model will be applied to after training. Much more likely tasks will be things such as predicting the effects of mutations, identification of splice sites that may be used across isoforms beyond just the canonical one, identification of regulatory sequences through ISM, or evaluation of human created sequences for design or evaluation purposes (such as in the context of an MPSA or designing a gene to splice a particular way), we would not expect their calibration to hold in any of these contexts. To resolve this issue, the authors should clarify and discuss this limitation in their paper (and in the relevant sections of the package documentation) to avoid confusing downstream users.

      (2) The clarity of their analysis of mutation effects could be improved with some minor adjustments. While they report median ISM importance correlation it would be helpful to see a histogram of the correlations they observed. Instead of displaying (and calculating correlations using) importance scores of only the reference sequence, showing the importance scores for each nucleotide at each position provides a more informative representation. This would also likely make the plots in 6B clearer.

    2. Reviewer #2 (Public review):

      Summary:

      The paper by Chao et al offers a reimplantation of the SpliceAI algorithm in PyTorch so that the model can more easily/efficiently be retrained. They apply their new implementation of the SpliceAI algorithm, which they call OpenSpliceAI, to several species and compare it against the original model, showing that the results are very similar and that in some small species pre-training on other species helps improve performance.

      Strengths:

      On the upside, the code runs fine and it is well documented.

      Weaknesses:

      The paper itself does not offer much beyond reimplementing SpliceAI. There is no new algorithm, new analysis, new data, or new insights into RNA splicing. There is not even any comparison to many of the alternative methods that have since been published to surpass SpliceAI. Given that some of the authors are well known with a long history of important contributions, our expectations were admittedly different. Still, we hope some readers will find the new implementation useful.

      Update for the revised version:

      The update includes mostly clarifications for tech questions/comments raised by the other two reviewers. There is no additional analysis/results that changes our above initial assessment of this paper's contribution.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Chao et al. produced an updated version of the SpliceAI package using modern deep learning frameworks. This includes data preprocessing, model training, direct prediction, and variant effect prediction scripts. They also added functionality for model fine-tuning and model calibration. They convincingly evaluate their newly trained models against those from the original SpliceAI package and investigate how to extend SpliceAI to make predictions in new species. While their comparisons to the original SpliceAI models are convincing on the grounds of model performance, their evaluation of how well the new models match the original's understanding of non-local mutation effects is incomplete. Further, their evaluation of the new calibration functionality would benefit from a more nuanced discussion of what set of splice sites their calibration is expected to hold for, and tests in a context for which calibration is needed.

      Strengths:

      (1) They provide convincing evidence that their new implementation of SpliceAI matches the performance of the original model on a similar dataset while benefiting from improved computational efficiencies. This will enable faster prediction and retraining of splicing models for new species as well as easier integration with other modern deep learning tools.

      (2) They produce models with strong performance on non-human model species and a simple, well-documented pipeline for producing models tuned for any species of interest. This will be a boon for researchers working on splicing in these species and make it easy for researchers working on new species to generate their own models.

      (3) Their documentation is clear and abundant. This will greatly aid the ability of others to work with their code base.

      We thank the reviewer for these positive comments.  

      Weaknesses:

      (1) The authors' assessment of how much their model retains SpliceAI's understanding of "nonlocal effects of genomic mutations on splice site location and strength" (Figure 6) is not sufficiently supported. Demonstrating this would require showing that for a large number of (non-local) mutations, their model shows the same change in predictions as SpliceAI or that attribution maps for their model and SpliceAI are concordant even at distances from the splice site. Figure 6A comes close to demonstrating this, but only provides anecdotal evidence as it is limited to 2 loci. This could be overcome by summarizing the concordance between ISM maps for the two models and then comparing across many loci. Figure 6B also comes close, but falls short because instead of comparing splicing prediction differences between the models as a function of variants, it compares the average prediction difference as a function of the distance from the splice site. This limits it to only detecting differences in the model's understanding of the local splice site motif sequences. This could be overcome by looking at comparisons between differences in predictions with mutants directly and considering non-local mutants that cause differences in splicing predictions.

      We agree that two loci are insufficient to demonstrate preservation of non-local effects. To address this, we have extended our analysis to a larger set of sites: we randomly sampled 100 donor and 100 acceptor sites, applied our ISM procedure over a 5,001 nt window centered at each site for both models, and computed the ISM map as before. We then calculated the Pearson correlation between the collection of OSAI<sub>MANE</sub> and SpliceAI ISM importance scores. We also created 10 additional ISM maps similar to those in Figure 6A, which are now provided in Figure S23.

      Follow is the revised paragraph in the manuscript’s Results section:

      First, we recreated the experiment from Jaganathan et al. in which they mutated every base in a window around exon 9 of the U2SURP gene and calculated its impact on the predicted probability of the acceptor site. We repeated this experiment on exon 2 of the DST gene, again using both SpliceAI and OSAI<sub>MANE</sub> . In both cases, we found a strong similarity between the resultant patterns between SpliceAI and OSAI<sub>MANE</sub>, as shown in Figure 6A. To evaluate concordance more broadly, we randomly selected 100 donor and 100 acceptor sites and performed the same ISM experiment on each site. The Pearson correlation between SpliceAI and OSAI<sub>MANE</sub> yielded an overall median correlation of 0.857 (see Methods; additional DNA logos in Figure S23). 

      To characterize the local sequence features that both models focus on, we computed the average decrease in predicted splice-site probability resulting from each of the three possible singlenucleotide substitutions at every position within 80bp for 100 donor and 100 acceptor sites randomly sampled from the test set (Chromosomes 1, 3, 5, 7, and 9). Figure 6B shows the average decrease in splice site strength for each mutation in the format of a DNA logo, for both tools.

      We added the following text to the Methods section:

      Concordance evaluation of ISM importance scores between OSAI<sub>MANE</sub> and SpliceAI

      To assess agreement between OSAI<sub>MANE</sub>  and SpliceAI across a broad set of splice sites, we applied our ISM procedure to 100 randomly chosen donor sites and 100 randomly chosen acceptor sites. For each site, we extracted a 5,001 nt window centered on the annotated splice junction and, at every coordinate within that window, substituted the reference base with each of the three alternative nucleotides. We recorded the change in predicted splice-site probability for each mutation and then averaged these Δ-scores at each position to produce a 5,001-score ISM importance profile per site.

      Next, for each splice site we computed the Pearson correlation coefficient between the paired importance profiles from ensembled OSAI<sub>MANE</sub> and ensembled SpliceAI. The median correlation was 0.857 for all splice sites. Ten additional zoom-in representative splice site DNA logo comparisons are provided in Supplementary Figure S23.

      (2) The utility of the calibration method described is unclear. When thinking about a calibrated model for splicing, the expectation would be that the models' predicted splicing probabilities would match the true probabilities that positions with that level of prediction confidence are splice sites. However, the actual calibration that they perform only considers positions as splice sites if they are splice sites in the longest isoform of the gene included in the MANE annotation. In other words, they calibrate the model such that the model's predicted splicing probabilities match the probability that a position with that level of confidence is a splice site in one particular isoform for each gene, not the probability that it is a splice site more broadly. Their level of calibration on this set of splice sites may very well not hold to broader sets of splice sites, such as sites from all annotated isoforms, sites that are commonly used in cryptic splicing, or poised sites that can be activated by a variant. This is a particularly important point as much of the utility of SpliceAI comes from its ability to issue variant effect predictions, and they have not demonstrated that this calibration holds in the context of variants. This section could be improved by expanding and clarifying the discussion of what set of splice sites they have demonstrated calibration on, what it means to calibrate against this set of splice sites, and how this calibration is expected to hold or not for other interesting sets of splice sites. Alternatively, or in addition, they could demonstrate how well their calibration holds on different sets of splice sites or show the effect of calibrating their models against different potentially interesting sets of splice sites and discuss how the results do or do not differ.

      We thank the reviewer for highlighting the need to clarify our calibration procedure. Both SpliceAI and OpenSpliceAI are trained on a single “canonical” transcript per gene: SpliceAI on the hg 19 Ensembl/Gencode canonical set and OpenSpliceAI on the MANE transcript set. To calibrate each model, we applied post-hoc temperature scaling, i.e. a single learnable parameter that rescales the logits before the softmax. This adjustment does not alter the model’s ranking or discrimination (AUC/precision–recall) but simply aligns the predicted probabilities for donor, acceptor, and non-splice classes with their observed frequencies. As shown in our reliability diagrams (Fig. S16-S22), temperature scaling yields negligible changes in performance, confirming that both SpliceAI and OpenSpliceAI were already well-calibrated. However, we acknowledge that we didn’t measure how calibration might affect predictions on non-canonical splice sites or on cryptic splicing. It is possible that calibration might have a detrimental effect on those, but because this is not a key claim of our paper, we decided not to do further experiments. We have updated the manuscript to acknowledge this potential shortcoming; please see the revised paragraph in our next response.

      (3) It is difficult to assess how well their calibration method works in general because their original models are already well calibrated, so their calibration method finds temperatures very close to 1 and only produces very small and hard to assess changes in calibration metrics. This makes it very hard to distinguish if the calibration method works, as it doesn't really produce any changes. It would be helpful to demonstrate the calibration method on a model that requires calibration or on a dataset for which the current model is not well calibrated, so that the impact of the calibration method could be observed.

      It’s true that the models we calibrated didn’t need many changes. It is possible that the calibration methods we used (which were not ours, but which were described in earlier publications) can’t improve the models much. We toned down our comments about this procedure, as follows.

      Original:

      “Collectively, these results demonstrate that OSAIs were already well-calibrated, and this consistency across species underscores the robustness of OpenSpliceAI’s training approach in diverse genomic contexts.”

      Revised:

      “We observed very small changes after calibration across phylogenetically diverse species, suggesting that OpenSpliceAI’s training regimen yielded well‐calibrated models, although it is possible that a different calibration algorithm might produce further improvements in performance.”

      Reviewer #2 (Public review):

      Summary:

      The paper by Chao et al offers a reimplementation of the SpliceAI algorithm in PyTorch so that the model can more easily/efficiently be retrained. They apply their new implementation of the SpliceAI algorithm, which they call OpenSpliceAI, to several species and compare it against the original model, showing that the results are very similar and that in some small species, pretraining on other species helps improve performance.

      Strengths:

      On the upside, the code runs fine, and it is well documented.

      Weaknesses:

      The paper itself does not offer much beyond reimplementing SpliceAI. There is no new algorithm, new analysis, new data, or new insights into RNA splicing. There is no comparison to many of the alternative methods that have since been published to surpass SpliceAI. Given that some of the authors are well-known with a long history of important contributions, our expectations were admittedly different. Still, we hope some readers will find the new implementation useful.

      We thank the reviewer for the feedback. We have clarified that OpenSpliceAI is an open-source PyTorch reimplementation optimized for efficient retraining and transfer learning, designed to analyze cross-species performance gains, and supported by a thorough benchmark and the release of several pretrained models to clearly position our contribution.

      Reviewer #3 (Public review):

      Summary:

      The authors present OpenSpliceAI, a PyTorch-based reimplementation of the well-known SpliceAI deep learning model for splicing prediction. The core architecture remains unchanged, but the reimplementation demonstrates convincing improvements in usability, runtime performance, and potential for cross-species application.

      Strengths:

      The improvements are well-supported by comparative benchmarks, and the work is valuable given its strong potential to broaden the adoption of splicing prediction tools across computational and experimental biology communities.

      Major comments:

      Can fine-tuning also be used to improve prediction for human splicing? Specifically, are models trained on other species and then fine-tuned with human data able to perform better on human splicing prediction? This would enhance the model's utility for more users, and ideally, such fine-tuned models should be made available.

      We evaluated transfer learning by fine-tuning models pretrained on mouse (OSAI<sub>Mouse</sub>), honeybee (OSAI<sub>Honeybee</sub>), Arabidopsis (OSAI<sub>Arabidopsis</sub>), and zebrafish (OSAI<sub>Zebrafish</sub>) on human data. While transfer learning accelerated convergence compared to training from scratch, the final human splicing prediction accuracy was comparable between fine-tuned and scratch-trained models, suggesting that performance on our current human dataset is nearing saturation under this architecture.

      We added the following paragraph to the Discussion section:

      We also evaluated pretraining on mouse (OSAI<sub>Mouse</sub>), honeybee (OSAI<sub>Honeybee</sub>), zebrafish (OSAI<sub>Zebrafish</sub>), and Arabidopsis (OSAI<sub>Arabidopsis</sub>) followed by fine-tuning on the human MANE dataset. While cross-species pretraining substantially accelerated convergence during fine-tuning, the final human splicing-prediction accuracy was comparable to that of a model trained from scratch on human data. This result indicates that our architecture seems to capture all relevant splicing features from human training data alone, and thus gains little or no benefit from crossspecies transfer learning in this context (see Figure S24).

      Reviewer #1 (Recommendations for the authors):

      We thank the editor for summarizing the points raised by each reviewer. Below is our point-bypoint response to each comment:

      (1) In Figure 3 (and generally in the other figures) OpenSpliceAI should be replaced with OSAI_{Training dataset} because otherwise it is hard to tell which precise model is being compared. And in Figure 3 it is especially important to emphasize that you are comparing a SpliceAI model trained on Human data to an OSAI model trained and evaluated on a different species.

      We have updated the labels in Figures 3, replacing “OpenSpliceAI” with “OSAI_{training dataset}” to more clearly specify which model is being compared.

      (2) Are genes paralogous to training set genes removed from the validation set as well as the test set? If you are worried about data leakage in the test set, it makes sense to also consider validation set leakage.

      Thank you for this helpful suggestion. We fully agree, and to avoid any data leakage we implemented the identical filtering pipeline for both validation and test sets: we excluded all sequences paralogous or homologous to sequences in the training set, and further removed any sequence sharing > 80 % length overlap and > 80 % sequence identity with training sequences. The effect of this filtering on the validation set is summarized in Supplementary Figure S7C.

      Reviewer #3 (Recommendations for the authors):

      (1) The legend in Figure 3 is somewhat confusing. The labels like "SpliceAI-Keras (species name)" may imply that the model was retrained using data from that species, but that's not the case, correct?

      Yes, “SpliceAI-Keras (species name)” was not retrained; it refers to the released SpliceAI model evaluated on the specified species dataset. We have revised the Figure 3 legends, changing “SpliceAI-Keras (species name)” to “SpliceAI-Keras” to clarify this.

      (2) Please address the minor issues with the code, including ensuring the conda install works across various systems.

      We have addressed the issues you mentioned. OpenSpliceAI is now available on Conda and can be installed with:  conda install openspliceai. 

      The conda package homepage is at: https://anaconda.org/khchao/openspliceai We’ve also corrected all broken links in the documentation.

      (3) Utility:

      I followed all the steps in the Quick Start Guide, and aside from the issues mentioned below, everything worked as expected.

      I attempted installation using conda as described in the instructions, but it was unsuccessful. I assume this method is not yet supported.

      In Quick Start Guide: predict, the link labeled "GitHub (models/spliceai-mane/10000nt/)" appears to be incorrect. The correct path is likely "GitHub (models/openspliceaimane/10000nt/)".

      In Quick Start Guide: variant (https://ccb.jhu.edu/openspliceai/content/quick_start_guide/quickstart_variant.html#quick-startvariant), some of the download links for input files were broken. While I was able to find some files in the GitHub repository, I think the -A option should point to data/grch37.txt, not examples/data/input.vcf, and the -I option should be examples/data/input.vcf, not data/vcf/input.vcf.

      Thank you for catching these issues. We’ve now addressed all issues concerning Conda installation and file links. We thank the editor for thoroughly testing our code and reviewing the documentation.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to reviewers


      We thank the reviewers for their constructive feedback, which has greatly improved the clarity and rigor of our manuscript. We have carefully addressed each comment below, indicating changes made to the text, figures, or supplementary material where appropriate. References to line numbers correspond to the revised version of the manuscript.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      * In this paper, the authors focus on the role of Reticulon-1C in concert with Spastin in response to axonal injury. In data mining, they find axonal mRNAs encoding for ER-associated proteins including Rtn-1. They establish a knockdown targeting both Rtn-1 isoforms Rtn-1A and Rtn-1C. They observe decreased beta-3-Tubulin levels in the soma while axonal protein levels are unchanged. In microfluidic devices, they characterise the effect of a compartment-specific Rtn-1 KD on axonal outgrowth in the axonal compartment. The authors quantify axonal outgrowth, seeing increased outgrowth in an axonal compartment-specific Rtn-1 KD, while the effect seems to be reversed when applying the KD construct in the somatic compartment. When focussing on the axonal growth cone, they find the Rtn-1 KD shows differences in several morphological features of the growth cone. They find an increase in Tubulin levels in an axonal compartment-specific, but a decrease in a somatic compartment-specific Rtn-1 KD. Colocalisation of Rtn-1C and Spastin is shown to be monolaterally increased following axotomy. Combining axotomy with the Rtn-1 KD shows increases in dynamic microtubule growth rates and track lengths. In another model system, neuron balls, they show Rtn1-C, but not Rtn1-A to be present in the axon. In a puro-PLA assay they also show it can be synthesised in the axonal compartment. To investigate the mechanism enabling the cooperation between Spastin and Rtn-1C, they move to a cell line model in which they see a correlating distribution between Spastin and Rtn-1C but not Rtn-1A. Finally, they use in silico modelling to speculate on binding between Spastin domains and Rtn-1 isoforms.*

      Major comment:

      The rationale behind the work is convincing, however some interpretations are presented as more robust than some data allow. Most notably, while the interaction between Rtn-1 and Spastin has been shown prior to this study, it is only presented here through in silico analysis. In figure 5, an increase in the growth rate of dynamic microtubules is observed in either a Rtn-1C KD or by using a Spastin-inhibitor. Due to a described increase in colocalisation between Rtn-1C and Spastin (5A), the increase in growth rate is displayed as caused by Rtn-1 promoting Spastin's severing ability. This result might however be correlative. Further in the injured samples, Spastin-levels seemingly increase (in the representative images) and it is thus not surprising that the level of Rtn-1C colocalising with Spastin increases as well. This might not be indicative of a cooperation and further experimental evidence are required.

      R: We thank the reviewer for this thoughtful comment. We agree that our interpretation should be more cautious, and we have revised the Title, Results and Discussion sections accordingly. In particular:

      1. Following yours and other reviewer comments, we have analyzed a new set of experiments regarding the STED images of non-injured and injured axons. To eliminate the risk of artifactual descriptions, we have avoided deconvolution and worked directly with raw STED images (Figure 5A). Under these conditions, the distribution of Spastin and its intensity in distal axons are not modified by injury, nor those of Rtn-1C and Spastin (Supplementary figure 4). We emphasize in the revised text that the in silico modeling we present is supportive, but not definitive, of a direct interaction. To address this concern, we clarify that our study builds on prior evidence of biochemical interaction between Rtn-1C and Spastin (Mannan et al., 2006), and that our own data demonstrate: i) compatible subcellular distribution in axons by super-resolution (STED microscopy, Figure 5A);ii) a potential functional interplay in axons (rescue of β3-tubulin levels by Spastin inhibition, Figure 5B), and iii) isoform-specific co-distribution with Spastin in heterologous cells that is associated with changes on microtubule integrity (see improved Figure 7). Together, these results go beyond correlative localization, but we acknowledge that they do not directly demonstrate a molecular complex in axons. Thus, we now indicate that "Although we did not directly test their molecular association, these results are consistent with Rtn-1C and Spastin sharing a similar subcellular localization, potentially enabling their functional interaction in distal axons" (lines 285-287)

      We would like to clarify a possible misunderstanding: in our experiments, the increase in microtubule growth rate was observed after axonal Rtn-1 KD. Spastazoline (SPTZ) only prevented the reduction in β3-tubulin levels induced by Rtn-1 KD, while leaving the KD-driven increase in growth rate and track length unaffected (Figures 5B-E). Thus, our interpretation is that axonal Rtn-1 KD correlates with increased Spastin function. (lines 307-309)


      Other comments:

      • Generally, graphs would benefit from individual values plotted as well as the summary. Font sizes and types (but rarely) are sometimes inconsistent. Proteins should be consistently written (capitalised or not).

      __R: __ We agree with the reviewer and thank for taking the time for noticing these inconsistencies as it significantly affects the quality of the work. We have improved several figures and added graphs plotting individual values (Figures: 2 C, 2E; 4 (A-E); 5E; 6D). We have reviewed the Font size and types more carefully and capitalized the proteins accordingly.

      • *Table 1 and figure 1 present data collected from a vast amount of resources. It should be highlighted that datasets from which data was obtained includes many different models, different DIVs and neuronal cell types. Figure 1B may benefit from a different colour scheme. "Ex-vivo" should be "Ex vivo". For "ER mRNAs are a relevant category" it is not described what "relevant" would mean in this context. The title might remove this small part or describe it in the text. It should be described how it is decided that mRNAs are "common". *

      • *

      __R: __We have now highlighted in the result section the diverse origins of the analyzed samples; We removed the indicated part from the text and explained that common mRNAs were chosen based on the Benjamini-Hochberg (Ben) analysis. (Page 33, lines 1299-1304).

      * - Figure 2: add description to y-axis to describe what fold change is displayed, applies to multiple figures. Will improve readability of the figures. In 2C, the ROI showing neuronal somata should be increased to show part of the axon and not cut off the soma.*

      • *

      __R: __We thank the reviewer for taking the time to highlight this. We have included this modification in figure 2 and throughout the article. We have also enlarged the indicated ROIs in figure 2C as requested. (Page 34)

      • *Figure 3: Three out of four axonal compartments seem to be comprised of dying or damaged axons. Especially the axonal KD scrambled image. It should be ensured that neuronal cultures are healthy. *

      • *

      __R: __We completely agree with the reviewer that the selected images were not describing the general good health of axons which has been accredited by the lack of fragmentation and functional responsiveness shown in (Figure 4 and 5 B, C, E). Thus, we have now replaced the previous axonal fields by more representative ones (Figure 3). (page 36)

      • *

      Typo in "intersections". The schematic of 3B is a great addition to explain the graphs above. Perhaps it could be a bit refined as it is currently hard to see whether this is a neuron or a growth cone without context. Maybe show where the axon connects to the depicted growth cones and change the third icon which looks like it was crossed out. Small formatting issues: remove additional space bar before "Figure 3." And add after "Bar"

      __R: __Many thanks for these great suggestions. We have now improved the figures as suggested and changed the indicated formatting issues. (page 36)

      - Figure 4: If not misunderstanding what is depicted, in 4A and B, different lookup tables are used to depict the same signal. Only one of each images is necessary. Do the axons have more tiny branches in the Rtn-1 KD condition in 4A? Unclear why Rtn-1 levels are increased in the Rtn-1 KD (4C), please clarify.

      • *

      __R: __We thank the reviewer for these observations. The reviewer is correct that different lookup tables were initially applied to the same image. Our intention was to highlight the fine distribution of axonal Rtn-1, but since this aspect is already clearly shown in previous figures, we now retain only a single lookup table. The appearance of tiny branches in the Rtn-1 KD condition represents an isolated observation and does not reflect a consistent or robust phenotype associated with Rtn-1 KD.

      As the reviewer points out, the increase of Rtn-1 in the cell bodies of injured neurons following axonal KD was initially surprising to us. However, this was a consistent phenomenon, as shown in the improved Figure 4. Of note, previous studies have reported that total Rtn-1C (but not Rtn-1A) levels increase in response to injury in cortical neurons(Fan et al., 2018). In our case, we interpret this as a compensatory somatic response triggered by the local reduction of Rtn-1 in injured axons. This interpretation is also consistent with the apparent lack of effect of siRNA on distal axonal Rtn-1 levels when applied locally after injury (while somatic application of the same siRNA does reduce axonal Rtn-1). Thus, after 24 hours of KD, the somatic upregulation of Rtn-1 may partially compensate for its expected local synthesis decrease. We have clarified this assumption in the revised text. (lines 247-251)

      - Figure 5: It may be easier to understand what "axotomy" samples are if just referred to as "injured" as later in the same figure. The procedure could also very briefly be explained in the results. 5C should depict AUC in µm2 not µm. 5D Spastin is barely visible, brightness and contrast should be adjusted to enhance visibility.

      • *

      __R: __We thank the reviewer for these helpful suggestions and have implemented the requested changes in Figure 5. Specifically:

      We now consistently refer to "axotomy" samples as "injured" throughout the figure and article. In addition, a brief explanation of the axotomy procedure has been added before Figure 2 and before figure 5, also the description has been clarified in Materials and methods. (lines 191-192) and (lines 289-290) and (lines 779-787)

      To improve the reproducibility of our outgrowth measurements, we revised this analysis approach. Based on previous work from a co-autor (McCurdy et al., 2019), instead of reporting the "relative number of intersections," we now present the total counts obtained from Sholl analysis of binarized axons (see Methods). To this end, we took advantage of the NeuroAnatomy plugin of FIJI, which more precisely tracks axon trajectories and makes the measurement more independent of axon width. Also, this new approach avoids the conflict we had with what we considered the "first line" after the groove ends, which was a bit of arbitrary. Accordingly, the correct term is now "summation of intersections (sum.)" at different distance bins, as reflected in Figure 5D. (page 40)

      For the former Figure 5D (now Figure 5B), we have improved the acquisition of representative images and applied a different set of lookup tables to enhance visibility. (page 40)

      - Figure 6: It should be made clear why it is necessary to switch to another model system just for 6A, please indicate this in the text. PCR bands seem very pixelated, check the quality. It is unclear why soma genes/proteins were only tested with either PCR or WB others with both. Rtn-1C and Rtn1-A should be presented in the same order in the PCR and WB panel. Correct "Rtn1-1A" typo. In 6D, 1.5 dots per soma seems like a low number. When normalized to the area the soma vs the axon occupies, the compartmentalization does not work? Maybe it makes sense to refine analysis or apply puromycin in the somatic compartment and analyze the axonal compartment as comparison?

      __R: __Many thanks for these observations. We have now included the following clarification in the text: "We sought to characterize the isoform expression of Rtn-1 mRNA and protein in both axons and cell bodies. Because microfluidic chambers yield only limited cellular material, we adopted an alternative culture approach using 'neuronballs.' This method enables the segregation of an axon-enriched fraction by mechanically separating axons from somato-dendritic structures" (lines 375-376).

      The resolution of PCR bands has been improved in the revised figure. Note that because the amount of cellular material is relatively scarce, we did not obtain too strong bands.

      The difference in the genes/proteins used for characterizing RNA and protein samples reflects our intention to treat both approaches as complementary. The PCR markers were primarily included to confirm sample purity, which also applies to the WB samples since they derive from the same preparation. In both assays, we used MAP2 as a dendritic marker to demonstrate axonal purity. While we acknowledge that the same genes could have been tested by both methods, we believe the results as presented adequately demonstrate the effective isolation of axons.

      We have switched the order of Rtn-1C/1A for consistency across PCR and WB panels and corrected the indicated typo in Figure 6A.

      We agree with the reviewer that an average of 1.5 puncta per soma initially appeared low. We have identified at least three reasons for this:

      First, the signal derives from only a 15-minute puromycin pulse, which is a very short labeling window. Second, our puro-PLA assay is particularly stringent, as ligation relies directly on puromycin- and Rtn-1C-labeled primary antibodies, without the additional spacing normally introduced by secondary antibodies. In standard PLA, the critical distance for amplification is ~30-40 nm, whereas in our assay this distance is even more restrictive. Third, in our initial analysis we applied an overly cautious threshold to define "true" amplification. We have now refined this threshold using a baseline defined by the absence of puromycin stimulation. With this improved criterion, we now quantify an average of ~5 puncta per soma and ~10 puncta per 1000 µm² of axonal area (Figure 6D and Supplementary Figure 3D). Assuming a neuronal soma diameter of 15 µm (area ≈ 176.71 µm²), this yields ~0.028 puncta per µm² in soma. In comparison, axons display ~0.01 puncta per µm², approximately one-third of the soma value, which is compatible with the idea thar cell bodies dominate neuronal protein synthesis.

      Following the reviewer's valuable suggestion, we performed additional quantifications in which puromycin was applied exclusively to the somatic compartment. Under these conditions, we still observed amplification in axons (~5 puncta per 1000 µm²), although this value was significantly lower than when puromycin was applied directly to axons. This analysis provided a novel appreciation of the puro-PLA technique in neurons: at least half of the signal originates in the axonal compartment, while a portion may reflect proteins synthesized in soma and transported anterogradely to the axon through yet-unknown mechanisms (potentially involving rapid anterograde transport) (Figure 6D). (page 42)

      • Figure 7: 7A shows two images depicting the same information that may not be needed. Can probably be removed. In 7B there is no negative (or any) correlation between Spastin levels and Tubulin, however later it is mentioned that Rtn-1C transports Spastin thus causing a decrease in Tubulin at certain locations? It is nclear if Spastin levels vary intensely between different samples. Mean intensity of the somatic area may be beneficial to rule this out. 7B Tubulin on the right top panel seems to have a decrease in Tubulin levels which is not visible due to the Y axis of Tubulin being set to a different range than the middle and lower panel. The average of line scans from multiple cells may be helpful to determine whether there is indeed no colocalization between Rtn-1A and Spastin. The provided representative images seem to show similar degrees of colocalization between Spastin and Rtn-1A/C.

      • *

      __R: __We thank the reviewer for these valuable observations and acknowledge that Figure 7 may have caused confusion. We have eliminated the fluorescence line-scan traces, as they can be biased depending on the region of the cell analyzed. Although this may not have been sufficiently emphasized in the text, we had already performed a quantitative colocalization analysis across multiple cells and independent experiments, using Mander's coefficients (Figure 7B). These analyses showed higher colocalization between Rtn-1C and Spastin compared to Rtn-1A. Regarding the concerns about variability in Spastin levels or possible bias from Y-axis scaling, we have eliminated those traces by the risk of bias. Also, we had already quantified the total tubulin fluorescence intensity across all the z-stacks and from multiple cells from independent experiments as shown in Figure 7C. To further rule out artifacts caused by variable transfection efficiency, we quantified total fluorescence intensity in both RFP and GFP channels across conditions. As shown in Supplementary Figure 6, no significant differences were observed, suggesting that the changes in tubulin reflect specific effects of Spastin/Rtn-1C co-expression rather than variability in expression levels.

      Results: - It would be helpful to reiterate the hypothesis at the start to ease the reading flow.

      __ R: __Many thanks, we have introduced a line reiterating the hypothesis as suggested (lines 117-118)

      - There seems to be minor redundancy in lines 132-138.

      • *

      __R: __Indeed, we have now removed the indicated phrase.

      • There are several spellings, proof-reading is recommended. For example, in line 136 should be "promotes". 160 "localla", 192 should be "the actin cytoskeleton".,194 should be "we first examined", 195 should be "Different", 223 "using", 259 "axons".

      __R: __We apologize for the spellings; we have now performed a careful proof-reading and introduced these corrections.

      - 154-155: Unclear, why the lower MW Rtn-1C was seen as more important.

      __R: __We apologize for not being clear enough. It is not necessarily more important, but we just took the Rtn-1C molecular weight as reference for the analysis considering that this isoform is the predominant in axons. In any case we have found a significant effect for both isoforms at least on siRNA 2 (data not shown), which is now expressed in the text (line 165-169) : "We also examined the 180 kDa band and found that siRNA 1 reduced expression to a mean of 0.41 relative to Scr, showing a strong trend that did not reach statistical significance (p = 0.05; N = 3; Wilcoxon test compared to 1, data not shown). In contrast, siRNA 2 further reduced expression to a mean of 0.29, which was statistically significant (p = 0.04; N = 3; Wilcoxon test compared to 1, data not shown)."

      - 167 results of 2E not stated before interpreting them.

      • *

      __R: __We have corrected this mistake.

      - 181 would suggest "outline" instead of "perimeter".

      • *

      __R: __We have considered this suggestion and included "outline", nevertheless the morphometric parameter is defined as perimeter, so we retained the term, but with the suggested clarification.

      • *

      - 183-184 "longest shortest path" is a confusing term.

      __R: __We agree that it is a confusing term, thus have now introduced multiple clarifications for the term in the leyend of figure 3 (page 36), and with more detail in a new section of Materials and methods (lines 697-699).

      • figure 4B should be referenced earlier in the sentence.

      __R: __We have corrected the sentence in the text.

      - 243-244 may be correlation. Rtn-1 and Spastin do not necessarily interact so that this result is achieved.

      • *

      __R: __Thanks for the clarification, we are aware that so far in the manuscript the conclusion is not correct, thus now we have stated at the end of the paragraph: "Together, these observations suggest that axonal Rtn-1 KD correlates with higher Spastin microtubule severing" (lines 307-309)

      - 246: In figure 1 the KD seemed to influence both Rtn-1 isoforms, why not here anymore? 259 "axons". 284 "counteract" instead of "suppress"?

      • *

      __R: __We acknowledge the confusion at this point of the article because of measuring a specific isoform. We now indicate that we will focus on Rtn-1C because of previous evidence of the literature pointing to an interaction of Rtn-1C with Spastin (line 264-267). Later we show that Rtn-1C is the predominant isoform in axons (Figure 6). We have corrected all the suggestions in the manuscripts.

      - 485: rephrase as the interaction between Rtn-1C with Spastin has not been shown directly in these experiments.

      __R: __Many thanks for the relevant clarification. Now, we have corrected:" Here, we have described an emerging mechanism relating Rtn-1C with the activity of Spastin, which is the most frequently mutated isoform in HSP (Hazan et al., 1999; Mannan et al., 2006)." (line 632-634). * Methods: 535 "in PBS". 543 citation error. 689-699 is it necessary to add a gaussian blur?*

      • *

      __R: __We have corrected the words and removed the wrong reference. Regarding the use of Gaussian blur, it is a very important point. We used this approach because, in our experimental conditions, it was critical to highlight moving particles that otherwise would go unnoticed by the noise. This was particularly manifest for the seemingly more "unorganized" movements of axonal microtubules after injury.

      References: Mannan, A U et al. appears twice in the citation list (36 and 44).

      * *R: Many thanks for the observation. Now we have corrected it.

      Reviewer #1 (Significance (Required)):

      Overall, this manuscript describes novel fundings which will be interesting to the neuronal cell biology community and scientists working on the field of neuronal injury and regeneration. It is well structured, and the data are mostly well presented but sometimes conclusions are over-interpreted. However, several points need to be addressed in a more convincing way.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Axonal mRNA localization and localized translation support many neuronal functions and is an important determinant of the regenerative potential of axons after injury. How this works mechanistically remains unclear. The authors present a well performed and technically challenging study in which they identify RTN-1 as a regulator of axonal outgrowth after injury. They provide evidence using experiments in microfluidic chambers that RTN1 is locally synthesized in axons. Interestingly, they identify a (local) interplay between RTN1 and Spastin which affects microtubules and thereby regulates the outgrowth of cortical axons after injury. This study provides an interesting new link between a locally synthesized protein (RTN1) and a microtubule-regulating protein Spastin that is changed upon axon injury. This provides an advance in our understanding in axon regeneration after injury and provides the basis for new studies that can further investigate this interplay. Although interesting, I have several concerns that should be clarified and are needed to substantiate the findings and model presented in this study.

      Major concerns:

      1. In figure 1, the authors provide an analysis of overlapping axonal mRNAs. There are more axonal transcriptome studies and a recent study by von Kugelgen and Chekulaeva (2020; doi: 10.1002/wrna.1590) already performed such an analysis, which included more studies. It would be good to mention this. It can be perceived that studies were now chosen to get the outcome that Rtn-1 is present in all studies. For example, von Kugelgen finds mRNA coding for RTN3, another ER structural protein, as present in 16 out of 20 studies analyzed. That said, the authors present more reasons to look at Rtn-1, so the selection to continue with this protein remains valid but can be written up differently so not to present it as the 'sole' ER-shaping protein consistently present in axonal transcriptomes. __R: __We appreciate this important observation to enrich the article; we are aware that the transcriptome data can be even further expanded to more recent studies. Thus, we have now included this reference in the main text and highlighted the relevant finding of RTN3. However, Kugelgen and Chekulaeva used data from dendrites/axons (neurites). Thus, we indicate that "...On a similar approach, but combining data from dendrites and axons, it was found that Reticulon-3 *mRNA is present in 16 out of 20 studies, further suggesting a wider presence of other mRNAs coding for ER structural proteins in axons " (line 128-131)

      2. The description of methods is currently insufficient and incomplete and does not allow for reproducibility of this study. For example, different Rtn-1 antibodies seem to be used in this study. Is the same antibody used for staining and WB? There is no listing of any of the antibodies used in the study and which one is used for which technique/experiment. This should be clarified and should be easy to do so in the methods section (antibody name, origin/company, dilution used) to enhance reproducibility of this study. This is not limited to primary antibodies and any information on secondary antibodies, including what was used for STED is completely missing.*

      3. *

      __R: __Thanks for these critical comments. First, we apologize for the former method version which was mistakenly not as accurate as it should. We have now revisited it and improved several points throughout this section. Regarding the use of primary and secondary antibodies, plasmids, siRNAs, and general reagents, they are all indicated in the Supplementary material, including company and dilution ("Reagent tables").

      • The timeline of KD experiments in Figures 2 and 3 are unclear. For the Western blot KD is performed at DIV7 and collected 48 hours later. However, this is not specified for the stainings done in Figure 2C-E. Is this also at DIV7 and then for 48 hours? In figure 3 the siRNA is added at DIV8 (together with axotomy) and outgrowth is measured 24 hours later. Is 24 hours sufficient to achieve knockdown? Is this also what was done for stainings? Later on in Figure 5B, 48 hours of KD is again used. It is unclear what the rationale of these differing timepoints is. Why was this chosen? Is the timeline also the reason for the difference in segment lengths chosen? In Figure 3, there is a significant effect on outgrowth in the KD in the 'mid-range' which is not present in Figure 5.*

      __R: __We regret the confusion, now all this information is explicitly clarified in the main text (lines 297-299) and the corresponding figure legends. We have strong reasons to have used these different time points. Figure 2 A-B is aimed at validating the siRNA against Rtn-1 thus we treated 7 DIV cultures for 48 hours to be sure of revealing a global effect by WB. In figure 2 C-D, we used the same 7 DIV cultures, but only for 24 hours. The reason for this is that, once the RNAi was validated, we explored its control on local synthesis in a shorter period based in previous literature supporting that axonal KD for 24 hours is sufficient for regulating axonal transcripts (Batista et al., 2017; Gracias et al., 2014; Lucci et al., 2020). We are also confident of using this time point based in the new supplementary figure 3D that shows a significant decrease on puro-PLA signal (indicative of Rtn-1C synthesis) 24 hours after axonal KD.

      In figure 3, we performed axotomy thus we had to wait a longer period for axons to grow (8 DIV) before fully cut them out, in this case we performed axonal KD from 8 to 9 DIVs. This is the same period used for the staining and quantifications shown in figure 4. All this is properly clarified in the main text and figures.

      In Figure 5 we performed a more challenging experiment that required to transfect cells with an EB3-GFP plasmid, then perform axotomy along with axonal KD as well as pharmacological treatment selectively in axonal compartment. First, we tried to measure microtubule dynamics under the same temporal frame of figure 3. Nevertheless, expression levels of EB3-GFP were not adequate for axonal measurements by live-cell imaging. Therefore, compared to figure 3, we increased the time frame after axotomy 24 hours (from 9 to 10 DIV) by this technical reason, but also to explore whether the changes on tubulin intensity might be revealed more clearly (which was the case, figure 5B). These considerations are now included in the main text

      Regarding the significant effect on outgrowth in the KD in the 'mid-range' which is not present in Figure 5. Given that in figure 5D axons are left growing for two days instead of one, the number of intersections and the differences between conditions is modified compared to figure 3, while retaining the overall trends. Note that to improve the reproducibility of our outgrowth measurements, we revised this analysis approach. Based on previous work of a co-autor (McCurdy et al., 2019), instead of reporting the "relative number of intersections," we now present the total counts obtained from the Sholl analysis of binarized axons (see Materials and methods). To this end, we took advantage of the NeuroAnatomy plugin of FIJI, which precisely tracks axon trajectories and makes the measurements more independent of axon width segmentation. Also, this new approach avoids the conflict we had with what we considered the "first line" after the groove ends, which was a bit of arbitrary. Accordingly, the correct term is now "summation of intersections (sum.)" at different distance bins, as reflected in the new Figure 5D.

      Could the authors provide a rescue condition for their siRNA (using a siRNA-resistant construct) to show that their siRNA is specific for RTN1. They nicely show the efficiency of the siRNA but not its specificity. This is crucial because if not specific, this will affect a large part of their study. They already have RTN1A and RTN1C constructs available. Such a rescue experiment should ideally also be performed for one or more of their phenotypic experiments, such as the one presented in Figure 3A or 5 to show that the phenotype is really RTN1 dependent. If done by re-expressing either RTN1A or RTN1C, this could provide insightful information on the relevant isoforms.

      __R: __We agree with the reviewer that this is a critical point. A major challenge in demonstrating the functional role of axonally synthesized proteins using a KD approach is that the rescue may also need to occur locally. Since axonal Rtn-1 appears to play a distinct role compared to its somato-dendritic counterpart (Figure 3), a siRNA-resistant construct would ideally require an axon-targeting sequence to restore local synthesis. As this is technically demanding, we have not yet been able to perform such an experiment, but we are actively working on identifying the optimal sequence to direct Rtn-1C to axons. Importantly, studies performing axonal KD typically rely on at least two independent siRNA sequences, thereby minimizing the likelihood that a phenotype arises from off-target effects. Thus, we have now validated a third siRNA (siRNA 3), which selectively downregulates Rtn-1C. Then, following the same experimental frame of figure 3, we performed axonal Rtn-1 KD after injury and observed that siRNA 3 also significantly increases the outgrowth of injured axons (Supplementary figure 2). This suggests that, at least this phenotype, is not product of an off-target effect. Complementarily, pharmacological rescue with the Spastin inhibitor SPTZ mitigated both the reduction in distal axonal β3-tubulin and the increase on axon outgrowth, supporting that the observed phenotypes are unlikely to arise from off-target effects. If these effects were due to random interference with unrelated mRNA targets, inhibition of an ostensibly independent target such as Spastin would not be expected to yield such a consistent rescue. Accordingly, SPTZ treatment alone did not increase β3-tubulin, indicating that its action is specifically contingent upon Rtn-1 KD. Taken together, the pharmacological rescue in axons (Figure 5B) and the Rtn-1C/Spastin co-distribution in heterologous cells, which correlates with preserved microtubules (improved Figure 7), provide converging evidence to suggest that Rtn-1C-Spastin interplay may underly the observed phenotypes in axons.

      • I find the data presented in Figure 4A/B confusing. Axonal RTN-1 KD does not reduce axonal RTN1 levels, but somatic KD does. I understand that this implies most protein comes from the soma, and the authors indeed present an explanation that increased somatic RTN1 occurs after axonal KD as a compensation mechanism. However, this can also be interpreted that there is no axonal synthesis of RTN1 after injury and axonal KD has indirect or even aspecific effects. Their model depends on this difference. Their data in Figure 6 could provide supporting evidence if it shows RTN1 puro-PLA after injury. Along these same lines, in Figure 6, they nicely include a compartment control for puro-PLA. It therefore seems doable to include a somatic puromycin control for their axonal puro-PLA, to exclude and diffusion/transport of the newly synthesized peptides. This is especially considering two recent papers reporting on this possible phenomenon, although these studies were not performed in neurons.*

      __R: __We consider the possibility that after injury there is no axonal Rtn-1 synthesis as a plausible and relevant appreciation. Unfortunately, we could not perform a puro-PLA experiment after injury, which would have provided a more definite answer. However, now we are more confident of regulating Rtn-1 synthesis before injury as supported by a new supplementary figure 3D that shows a significant decrease on puro-PLA signal (indicative of Rtn-1C synthesis) 24 hours after axonal KD. Thus, based on the similar phenotypes observed before and after injury, we consider our results are still compatible with Rtn-1 axonal synthesis being downregulated, but not absent after injury. First, axonal Rtn-1 KD decreased β3-tubulin levels before and after injury according to figure 5B and the improved statistical analysis performed on figure 2E. Similarly, axonal Rtn-1KD significantly increases microtubule growth rate before and after injury according to the current statistical comparisons (Figure 5E). Second, if β3-tubulin decrease was a merely unspecific siRNA targeting, it is unlikely that SPTZ treatment should increase and restore β3-tubulin levels only in the context of axonal Rtn-1 KD (Figure 5B). We have now included these considerations in the discussion (lines 537-543). Although on a different track, the mechanistic relationship between Rtn-1C and Spastin suggested in Figure 7 could make more plausible that a similar phenomenon regarding the control of tubulin levels may occur locally in axons.

      Following the reviewer's valuable suggestion, we performed additional quantifications in which puromycin was applied exclusively to the somatic compartment. Under these conditions, we still observed amplification in axons (~4 puncta per 1000 µm²), although this value was significantly lower than when puromycin was applied directly to axons (~10 puncta per 1000 µm²). This analysis provided a novel appreciation of the puro-PLA technique in neurons: at least half of the signal originates in the axonal compartment, while a portion may reflect proteins synthesized in soma and transported anterogradely to the axon through yet-unknown mechanisms (potentially involving rapid anterograde transport). Note that we revised the criteria for detecting true amplification spots based in staining without puromycin, which increased true amplification numbers. Still, these seemingly low values are compatible with reflecting a limited amount of time (only 15´ of puromycin pulse) and the stringent conditions of this experiment in which secondary antibodies were avoided by directly labeling primary ones. This approach makes the classical 30-40nm distance for PLA even narrower, thus reducing signal. In any case, assuming a neuronal soma diameter of 15 µm (area ≈ 176.71 µm²), this yields ~0.028 puncta per µm² in somata. In comparison, axons display ~0.01 puncta per µm², approximately one-third of the soma value, which makes sense for the expected difference in ribosome density.

      • In Figure 5A the authors find an increased co-localization (RTN1/Spastin) after axotomy. From their images, it seems that the amount of Spastin is hugely increased, which would by default increase the chance of (random) colocalization of RTN1 on Spastin. Could the authors comment on this?*

      __R: __Thanks for this relevant and constructive critique. We formerly based our colocalization analysis on deconvolved images. However, after performing several quantifications through different deconvolution parameters, we were not convinced about the robustness of this finding and the performed staining. Thus, we performed a new set of experiments and found that non-deconvolved images from the STED microscope were more informative about the expected tubular morphology of the axonal ER. Thus, we improved figure 5A, and now the main conclusion is just that both proteins are closely distributed in distal axons before and after injury.

      • In figure 5E and 5F, the condition of scr + SPTZ is omitted. What is the reason for this? The explanation of results in these figures is confusing. The authors report a 'clear trend' in increase in comet track length and lifetime upon addition of SPTZ to axonal RTN-1 KD. This is however not significant. The comparisons that are made afterwards are confusing (e.g. increase in comet lifetime of SPTZ in non-injured axons with RTN1 KD compared to Scr+DMSO and KD + DMSO in injured axons). Their conclusion is axonal RTN-1 synthesis in injured axons (see my concern in the points above on this) governs microtubules growth rate beyond Spastin activity yet blocking Spastin activity still completely blocks the effect of KD on outgrowth.*

      * *__R: __We thank this observation and fully agree that the general description provided in figure 5 E wasn't satisfactory. We have re-organized the descriptions of these results and performed more relevant statistical comparisons (lines 338-359). Based on the reviewer observation, we now conclude: "Together, these results suggest that axonal Rtn-1 synthesis controls microtubule dynamics in both non-injured and injured axons, mostly independently of Spastin-mediated microtubule severing." (lines 357-359).

      Other/minor concerns:

      - The gene ontology analysis in Figure 1A contains the category 'Endoplasmic reticulum'. In this category are mainly ribosomal proteins. Although in a gene ontology analysis these proteins will be included in this category, it is misleading in this respect since they are just as likely to be coming from cytoplasmic ribosomes. Although it cannot be excluded that these are ER-bound ribosomes, not in the last place because a recent study (Koppers et al., 2024, doi: 10.1016/j.devcel.2024.05.005) found ribosomes attached to the ER in axons, I believe the category should be adapted or at the least clarified in the text.

      • *

      __R: __Many thanks for the suggestion, which is now included in the text. "Note that several of the identified transcripts in the category 'endoplasmic reticulum' code for cytoplasmic ribosomal components, which indeed can be attached to the axonal ER (Koppers et al., 2024) and be locally synthesized in axons (Shigeoka et al., 2019)." (lines 125-128)

      - Is RTN-1C isoform still an ER-shaping protein or rather an ER protein with alternative functions? The final sentence in the abstract makes a statement that a locally synthesized ER-shaping protein lessens microtubule dynamics. Could the authors provide a clearer description and discussion of the evidence in literature for this? RTN1C has been suggested to perform alternative functions in which case the statement that the local synthesis of an ER-shaping protein is important for axonal outgrowth should be adapted.

      R: We agree with the reviewer and are aware that some non-canonical roles of Rtn-1C may partially explain the observed phenotypes. Thus, we have rephrased the last statement of the abstract: "These findings uncover a mechanism by which axonal protein synthesis provides fine control over the microtubule cytoskeleton in response to injury.". Also, we have modified the discussion section introducing new references accordingly..." Some studies have pointed to a non-canonical role for Rtn-1C in the nucleus, including DNA binding and histone deacetylase inhibition (Nepravishta et al., 2010, 2012). It is tempting to speculate that these still emerging roles may also contribute to the observed phenotypes. Of note, different axonally synthesized proteins exert transcriptional control in response to injury or local cues (Twiss et al., 2016)." (lines 576-580).

      • Is there a difference in RTN1 distribution or levels pre- and post-axotomy?

      R: Thanks for the suggestion, with the new analysis we have only found slight reorganization of Rtn-1C and Spastin in distal axons (Figure 5A). We have also included now quantification of their levels and found no significant differences for both proteins (Supplementary figure 4)

      - Line 100/101 states 'the interactome of the axonal ER provides...'. To my knowledge there has been no study looking at the interactome of the axonal ER specifically. Surely axonal ER proteins are known but there is a difference.

      • *

      __R: __We agree with the reviewer that the phrase was misleading, so we rephrased it in the introduction "...Different lines of evidence support that the protein components of the axonal ER may interact with proteins that regulate microtubule dynamics"

      * - Typo line 160 'localla'*

      • *

      __R: __Thanks for taking the time, we have now corrected it.

      - In Figure S1 B, please add the DIVs to make it clearer what each graph corresponds to. The legend of S1B states different distances from the cell body but the graph shows distances from the tip.

      • *

      __R: __We have now corrected the legend accordingly.

      - Figure 2C, why does B3 tubulin decrease in soma, aspecific effect of siRNA?

      • *

      __R: __This was indeed an unexpected finding. However, we do not observe unspecific or global changes in β3-tubulin levels (see Figure 2A and Supplementary Figure 2). Considering our other results linking Rtn-1 to the regulation of the microtubule cytoskeleton, we interpret this decrease as an indirect effect of Rtn-1 depletion rather than an off-target action of the siRNA. Moreover, if the effect were unspecific, both proteins would likely be reduced in the cell body, given that the siRNA was specifically designed to target Rtn-1 as its primary sequence-specific target.

      - What is the rationale on the opposite effect found in outgrowth in Figure 3?

      • *

      __R: __The apparent opposite outcomes observed in Figure 3 - where axonal versus somatic Rtn-1 knockdown leads to divergent effects on axonal outgrowth - can be explained by compartment-specific environments and isoform distribution. The siRNA targets the conserved RHD region, reducing both Rtn-1A and Rtn-1C. Axons are enriched in Rtn-1C. Thus, axonal KD preferentially reduces Rtn-1C. In contrast, somatic KD reduces both isoforms. Rtn-1A, predominant in cell bodies, may probably engage other signaling pathways (Kaya et al., 2013). Interestingly, it was reported by Nozumi et al. (2009b) that global Rtn-1 depletion reduces axonal outgrowth in developing cortical neurons. This aligns with the notion that somatic KD mimics a more global loss of function, whereas axonal KD reveals a compartmentalized, pro-regenerative effect due to local Rtn-1C regulation. (All the references indicated here are in the main manuscript). These considerations are now included in the discussion ( lines 581-593).

      * - Missing word 'we' on line 194*

      • *

      __R: __ We have corrected it.

      - Typo line 629 'witmn h', please proofread the entire manuscript carefully.

      • *

      __R: __ We apologize for the spellings, now we have carefully revised the manuscript.

      - Could the authors comment on why, in Figure 7B/C, GFP only is colocalizing with Spastin-RFP? In general, GFP should be diffusive and not display punctate colocalization with Spastin.

      • *

      We appreciate the reviewer's comment. Under normal conditions, GFP displays a diffuse cytoplasmic distribution. However, in our experimental setup, we observed punctate GFP signals only in the context of co-expression with Spastin-RFP. This is consistent with prior reports showing that soluble GFP can occasionally be sequestered into late endosomal structures (Sahu et al., 2011), which are also known to harbor the M87 Spastin isoform (Allison et al., 2013; Allison et al., 2019). To rigorously exclude the possibility of unspecific fluorescence crosstalk, we independently acquired each fluorophore channel and confirmed that GFP puncta were genuine and not due to bleed-through (Supplementary Figure 5). Further, cells expressing only GFP or only Spastin-RFP did not show overlapping puncta, and co-expression of GFP with Rtn-1A-RFP did not produce any apparent overlap, indicating that the punctate GFP pattern is specifically associated with Spastin co-expression. Thus, the observed GFP colocalization with Spastin reflects a biological phenomenon potentially linked to the endosomal localization of M87 Spastin, and not an artifact of imaging or fluorophore bleed-through.

      Reviewer #2 (Significance (Required)):

      * Axonal mRNA localization and localized translation support many neuronal functions and is an important determinant of the regenerative potential of axons after injury. How this works mechanistically remains unclear. The authors present a well performed and technically challenging study in which they identify RTN-1 as a regulator of axonal outgrowth after injury. They provide evidence using experiments in microfluidic chambers that RTN1 is locally synthesized in axons. Interestingly, they identify a (local) interplay between RTN1 and Spastin which affects microtubules and thereby regulates the outgrowth of cortical axons after injury. This study provides an interesting new link between a locally synthesized protein (RTN1) and a microtubule-regulating protein Spastin that is changed upon axon injury. This provides an advance in our understanding in axon regeneration after injury and provides the basis for new studies that can further investigate this interplay. Although interesting, I have several concerns that should be clarified and are needed to substantiate the findings and model presented in this study.*

      *

      The audience for this study will be mainly basic research in the fields of both axonal protein synthesis and axon regeneration. My expertise is in the field of mRNA localization and local protein synthesis.*

      Batista, A. F. R., Martínez, J. C., & Hengst, U. (2017). Intra-axonal synthesis of SNAP25 is required for the formation of presynaptic terminals. Cell Reports, 20(13), 3085. https://doi.org/10.1016/J.CELREP.2017.08.097

      Fan, X. xuan, Hao, Y. ying, Guo, S. wen, Zhao, X. ping, Xiang, Y., Feng, F. xue, Liang, G. ting, & Dong, Y. wei. (2018). Knockdown of RTN1-C attenuates traumatic neuronal injury through regulating intracellular Ca2+ homeostasis. Neurochemistry International, 121, 19-25. https://doi.org/10.1016/J.NEUINT.2018.10.018

      Gracias, N. G., Shirkey-Son, N. J., & Hengst, U. (2014). Local translation of TC10 is required for membrane expansion during axon outgrowth. Nature Communications 2014 5:1, 5(1), 1-13. https://doi.org/10.1038/ncomms4506

      Lucci, C., Mesquita-Ribeiro, R., Rathbone, A., & Dajas-Bailador, F. (2020). Spatiotemporal regulation of GSK3β levels by miRNA-26a controls axon development in cortical neurons. Development (Cambridge), 147(3). https://doi.org/10.1242/DEV.180232,

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      This manuscript investigates the relationship between the endoplasmic reticulum morphogen reticulon-1 (Rtn-1) and the microtubule severing protein spastin in axons after injury. The main message and conclusion of the paper is that local axonal synthesis of Rtn-1 plays a role in regulating the microtubule severing activity of spastin by interacting with spastin and inhibiting its activity. This mechanism would be important after injury by regulating axonal growth.

      * The conclusions of the paper are based on the following claims:*

      * 1) Rtn-1 is synthesized locally in axons.*

      * 2) Specific downregulation in Rtn-1 in axons using microfluidic chambers affects microtubules abundance (measured by beta-3 tubulin) and promotes axon growth after injury.*

      * 3) Inhibition of spastin MT-severing activity with a specific drug rescues the growth effect induced by axonal downregulation of Rtn-1.*

      * 4) Rtn-1c interacts with spastin-M87 to limit its MT-severing activity in a cellular system upon overexpression.*

      *

      *

      Major comments:

      1) Evidence that Rtn-1 is synthesized in axons comes from two experiments. Initially, the authors show that Rtn-1 siRNA transfection in the axonal compartment of microfluidic chambers reduces Rtn-1 levels in axons, suggesting that there is some local synthesis. Although this method is very attractive, I am concerned about the statistical analysis. The graphs show bars rather than individual data points from the average of many neurons (about 300). The plots also show the SEM instead of the SD, thus covering all the variability that is inherent in this type of experiment. The statistics are probably not performed on the 3 biological replicates, but consider the individual neurons as N. This is obviously not correct, since neurons in an experiment may all be affected by the same technical problem and are not independent replicates. For this reason, I am a bit skeptical about this quantification. Another problem is that the quantification of the fluorescence intensity of the sample does not take the nuclei into account. Are the nuclei removed for analysis? Are the images single planes? Addressing the quantification issues is crucial also for data in Figure 4, where the authors show a different effect of Rtn-1 axonal KD after injury.

      * The second experiment is the Puro-PLA in Figure 6D. This experiment shows an average of 1.5 dots of signal per soma, which is a very low level of translation for this compartment where most of the synthesis should be taking place. In the axons, it is not clear how they calculate the axonal area. Again, the number of dots detected is very low and the physiological significance is questionable. A control with a known mRNA translated in axons would be important.*

      * Finally, as an important control, the authors should show the presence of Rtn-1 mRNA by FISH in their experimental system.*

      __R: __We appreciate the critical points addressed here as they moved us to improve the quality of the findings. We analyzed cells/axons as statistical units to increase statistical power given the subtle nature of these local changes. We agree with the reviewer that this approach may increase the risk of finding false positives. To address this point, i) we plotted the individual data points and colored them according with the different experimental dates (all the dates showed a similar trend) ii) We indicated SD instead of SEM iii) We analyzed our data using linear mixed-effects models, with experimental date included as a random effect. This approach allows to preserve the granularity and statistical power, while avoiding pseudoreplication. To exclude artifactual changes, we now analyzed the intensity fold change of total fluorescence normalized to Scr. Our former quantifications were based on the corrected fluorescence intensity used to construct the plot profiles, which could be adding some distortion to the measurements. These changes were applied throughout figures 2 and 4 (pages 34 and 38, respectively). After these new analyses the formerly presented results remain valid.

      We thank the reviewer for raising concerns about the quantification of fluorescence intensity in cell bodies. We now specify in Materials and methods that fluorescence intensity analysis of distal axons (always isolated by the microfluidic chambers) and of cell bodies was performed using the wide-field configuration of the microscope. In all the cases, a single (epifluorescent) plane was analyzed to reflect the total fluorescence of a cell or axon. We did not exclude the nuclear region from the quantifications, as this would also remove cytoplasmic signal located above or below the nucleus.

      We also understand the concerns about puro-PLA experiments. We agree with the reviewer that an average of 1.5 puncta per soma initially appeared low. We have identified at least three reasons for this. First, the signal derives from only a 15-minute puromycin pulse, which is a short labeling window. Second, our puro-PLA assay is particularly stringent, as ligation relied directly on puromycin- and Rtn-1C-labeled primary antibodies, without the additional spacing normally introduced by secondary antibodies. In standard PLA, the critical distance for amplification is ~30-40 nm, whereas in our assay this distance is even more restrictive. Third, in our initial analysis we applied an overly cautious threshold to define "true" amplification. We have now refined this threshold using a baseline defined by the absence of puromycin stimulation. With this improved criterion, we now quantify an average of ~5 puncta per soma and ~10 puncta per 1000 µm² of axonal area (Supplementary Figure 3D). As it is now included in methods, we calculated the axonal area by binarizing β3-tubulin staining and only counted the true amplification spots inside this region. Assuming a neuronal soma diameter of 15 µm (area ≈ 176.71 µm²), this yields ~0.028 puncta per µm² in somata. In comparison, axons display ~0.01 puncta per µm², approximately one-third of the soma value which seems more reasonable. This is also compatible with most of Rtn-1C synthesis comes from the cell body.

      Unfortunately, we could not be able to perform puro-PLA of other axonally synthesized proteins. Nevertheless, to further validate our puro-PLA signal, we tested the specificity of the Rtn-1C antibody we used for this assay by WB, IF, and Rtn-1 KD (Supplementary figure 3 A-C). In addition, we performed axonal Rtn-1 KD in microfluidic chambers for twenty-four hours, which elicited a significant decrease in puro PLA signal compared to Scr (Supplementary figure 3D). Together, these results strongly indicate that the quantified signal reflects Rtn-1C synthesis. To prove that Rtn-1 mRNA is present in these conditions, we now included a RT-PCR performed on RNA isolated from the somato-dendritic and pure axonal fractions of 8 DIV microfluidic chambers (Supplementary figure 3D). Note that the presence of this mRNA in axons has been supported by several studies, one of them using cortical neurons of similar DIV and cultured in microfluidic chambers (Table I and figure 1).

      2) The effects on tubulin following Rtn-1 downregulation in axons is potentially very interesting, but the authors should be careful because it could also mean that the axons are suffering. Can they also stain for other cytoskeletal markers?

      R: Regarding this concern, we are aware that in the former Figure 3 we mistakenly selected axonal fields that did not display healthy axons, which was not the dominant trend. This is accredited by the lack of fragmentation and by the functional responsiveness (microtubule dynamics) shown in Figures 4 and 5B, C, E. We have now replaced the previous axonal fields in Figure 3 with more representative axons (healthy), devoid of varicosities and fragmentation (page 37)

      3) The results using SPTZ are very interesting and implicate spastin microtubule severing activity in the observed phenotype. In my opinion these experiments however do not prove that "axonal Rtn-1 is indeed promoting the severing of microtubules by spastin", but simply that the blocking spastin activity prevents the appearance of the microtubular phenotype (which appears still with a mysterious mechanism). What happens if they try to stabilize the cytoskeleton by another mean (with taxol for example?). The authors should rephrase this conclusion.

      __R: __We completely agree with the reviewer's appreciation. We now explicitly indicate in the main text that this is (so far in the manuscript) a still correlative phenomenon that suggests an interplay with Spastin activity "..Together, these results suggest that locally synthesized Rtn-1 normally acts to suppress the outgrowth of injured axons, a process that could involve the microtubule-severing activity of Spastin." (lines 321-323). Later in the article, with the improved Figure 7, we further propose that these findings may reflect a causal relationship, although this mechanism has not yet been directly demonstrated in axons.

      4) The last experiment (Figure 7) that aims to connect Rtn-1 and spastin function is very artificial, since it is based on overexpression. Why should spastin M87 interact with an ER morphogen? Endogenously it is conceivable that spastin M1 which localizes to the ER would interact with Rtn-1. Moreover, this experiment needs further controls and quantifications. First, it is quite obvious from panel 7C that there is crossover of signal in the two fluorescence channels (see GFP and spastin). Controls need to be shown, where only one of the two fluorescent proteins is expressed, and the specificity of the laser is tested. This experiment is based on only 1 cell shown where co-localisation is detected based on a line that is placed in a specific area of the cell. The effects on the microtubular network needs quantification.

      __R: __We have now improved Figure 7 and added the requested controls to rule out crosstalk as indicated in Supplementary Figure 5 and in the main text. We agree that under normal conditions GFP should display a diffuse cytoplasmic distribution. However, in our experimental setup, we observed punctate GFP signals only in the context of co-expression with Spastin-RFP. This is consistent with prior reports showing that soluble GFP can occasionally be sequestered into late endosomal structures (Sahu et al., 2011), which are also known to harbor the M87 Spastin isoform (Allison et al., 2013; Allison et al., 2019). To exclude the possibility of unspecific fluorescence crosstalk, we independently acquired each fluorophore channel and confirmed that GFP puncta were genuine and not due to bleed-through (Supplementary Figure 5). Further, cells expressing only GFP or only Spastin-RFP did not show overlapping puncta (arrowheads), and the co-expression of GFP with Rtn-1A-RFP did not produce any apparent overlap, indicating that the punctate pattern of GFP is specifically associated with Spastin co-expression. Thus, we consider that the observed GFP colocalization with Spastin potentially reflects a true phenomenon and not an artifact of imaging or fluorophore bleed-through.

      We thank for these observations and apologize for the confusion in the outline of the former figure 7 and the lack of a better description. As the reviewer indicates, one interesting aspect of the M87 isoform is that lacks the ER morphogen domain (so is soluble or cytoplasmic in principle). However, it also harbors endosome and microtubule binding domains which according to previous literature (now included in the main text) may render it a punctate rather than a homogeneous pattern. Also, M87 is the most abundant isoform in the nervous system, particularly at early development. This is the reason why we selected this isoform to test our model. To clarify this point, we based our colocalization analysis in different cells and experimental dates and analyzed all the z-stacks for each cell (see new figure 7B and methods), the intensity plots (now removed) were only for graphical purposes. Similarly, we had already quantified the total tubulin intensity in COS cells based on many cells from different dates and included the sum projections of all the z-stacks from these cells (see new figure 7C). Thus, we removed the intensity profiles as they were clearly misleading (see new figure 7).

      We agree that over-expressing constructs may force interactions or co-distribution of proteins. However, in this case, if the observed results were mainly due to over-expression, we should see a similar trend with isoform A as both constructs are under the control of the same strong promoter (CMV) and harbor the same ER morphogen domain (RHD). Nevertheless, the distribution of M87 tightly mirrors Rtn-1C, which is not the case for Rtn-1A. Only as a theoretical prediction, our molecular modeling suggests that Rtn-1C may be associated with Spastin through its microtubule binding domain (Figure 7E). This would suppose that Spastin "decorates" ER-tubules rather than being in the same ER membranous structure. This discrete pattern of Spastin is more coherent with the distribution of both proteins that is now more clearly observed in distal axons by STED super-resolution (new figure 5A). So, despite a bit unexpected, these results suggest a novel interaction mechanism between these two proteins that deserves further validation.

      5) What is exactly the model proposed? The title implies that axonal synthesis of Rtn-1 is important during injury, but the data in the paper rather suggest that upon injury the majority of Rtn-1 is not locally synthesized. If the levels of Rtn-1 do not change, why the effect on the microtubules should be specific? Why would a siRNA against Rtn-1 in axons not affect the levels of Rtn-1, but those of tubulin? The authors should be careful, and test other control siRNAs, and Rtn-1 siRNAs, since it is well known even in more simple cellular systems that the toxicity of individual siRNAs can vary greatly.

      We consider the possibility that after injury there is no axonal Rtn-1 synthesis as a plausible and relevant appreciation. Unfortunately, we could not perform a puro-PLA experiment after injury, which would have provided a more definite answer. However, now we are more confident of regulating Rtn-1 synthesis before injury as supported by a Supplementary figure 3D that shows a significant decrease on puro-PLA signal (indicative of Rtn-1C synthesis) 24 hours after axonal KD. Thus, based on some similar phenotypes before and after injury, we consider our results are still compatible with Rtn-1 axonal synthesis being downregulated, but not fully absent (the mRNA is still detected, as described by Taylor 2009). As such, axonal Rtn-1 KD decreased β3-tubulin levels before and after injury according to figure 5B and the improved statistical analysis performed on figure 2E. Similarly, axonal Rtn-1KD significantly increases microtubule growth rate before and after injury according to the current statistical comparisons (Figure 5E). in complement, if β3-tubulin decrease was merely due to unspecific siRNA targeting, it is unlikely that SPTZ treatment should restore β3-tubulin only in the context of axonal Rtn-1 KD (Figure 5B). Although on a different track, the mechanistic relationship between Rtn-1C and Spastin suggested in Figure 7 could make more plausible that a similar phenomenon regarding the control of tubulin levels could be occurring locally in axons. We have now included these considerations in the discussion (lines 535-543).

      To discard off-targets effects, we have now validated a third siRNA sequence (siRNA 3) specifically designed against Rtn-1 and showed that it selectively downregulates Rtn-1C but not β3-tubulin in cultured cortical neurons. Then, following the same experimental frame of figure 3, we performed axonal Rtn-1 KD after injury and observed that siRNA 3 also significantly increases the outgrowth of injured axons (Supplementary figure 2). This suggests that, at least this phenotype, is not product of an off-target effect. Thus, the pharmacological rescue of β3-tubulin levels by SPTZ (Figure 5B) and the Rtn-1C/Spastin co-distribution in heterologous cells, which correlates with preserved microtubules (improved Figure 7), provide converging evidence to suggest that Rtn-1C-Spastin interplay may underly the observed phenotypes in axons.

      Minor comments:

      In Figure 5A, it would be helpful to indicate the border of the axon. The figure is not really convincing.

      Following yours and other reviewer comments, we have analyzed a new set of experiments regarding the STED images of non-injured and injured axons. To eliminate the risk of artifactual descriptions, we have avoided deconvolution and worked directly with raw STED images (Figure 5A). Under these conditions, distribution of Spastin and its intensity in distal axons are not modified by injury, nor those of Rtn-1C and Spastin (Supplementary figure 4). Despite these results, data still supports that both proteins are restricted to similar domains subcellular domains before and after injury.

      Reviewer #3 (Significance (Required)):

      The manuscript uses complex methods to address an interesting cell biological question of relevance to understand axonal growth regulation upon injury. A limitation of the study is the statistical analysis, which triggers some doubts about the reproducibility of the data. Further experiments and the addition of controls would be important to support the claims of the authors.

    1. My husband was with us as well, andhe didn't notice any switch in my English.

      This stood out to me because I definitely code switch depending on the different groups I am in.

    1. The article “Use of diverse data sources to control which topics emerge in a science map” aims to analyze the effects of different data sources on topic clustering bias in science maps. For this purpose, the clustering effectiveness of different topic categories is analyzed based on different traditional and non-traditional data sources.

      (1) contribution to existing literature

      The present research is well embedded in the existing body of literature and builds on the study Which topics are best represented by science maps? An analysis of clustering effectiveness for citation and text similarity networks by Bascur, Verberne, van Eck, and Waltman (2024). That study explored the extent to which science maps can successfully cluster documents that address the same topic - a concept referred to as clustering effectiveness. This metric serves as an indicator of the thematic precision of clustering approaches. Bascur et al. (2024) found that clustering effectiveness varies depending on the topic domain: documents related to certain topics, such as diseases, were more accurately clustered than those related to others, such as geography. Building on these findings, the present study investigates whether the clustering effectiveness for documents on the same topic is influenced by the choice of data source, and whether this effectiveness can be systematically adjusted or optimized through the selection of that source.

      (2) major strengths and weaknesses

      The article’s ideas and arguments are presented with clarity and precision. Its structure follows a classic and well-established format - introduction, background, methods, results, discussion, and conclusion - which makes it easy to follow. As a reader, I never lost the thread; the narrative remains coherent and accessible throughout. The current state of research is conveyed in a thorough, well-reasoned, and nuanced manner. Particularly noteworthy is the detailed introduction to the topics of science maps based on diverse sources and comparing clustering solutions of different networks. This contextualization is both comprehensive and essential for understanding the research that follows.

      The document selection is highly extensive (4,142,511 documents) and well-justified. The rationale for which documents are included in the study is clearly and convincingly presented. All selection criteria are explained in detail in Section 3.1.

      The introduction clearly explains the rationale for using non-traditional data sources alongside traditional data sources, and the justification is both logical and easy to follow. The external data sources are introduced and described in Section 3.2. The procedures for building the different networks (Sections 3.3 and 3.4), as well as the clustering approaches (Section 3.5), are also thoroughly explained. The topics and topic categories analyzed in the study are presented and justified in detail in Section 3.6. To evaluate how well different topics are represented within the clustering networks, the study employs the concept of clustering effectiveness. The relevant calculations are described in Section 3.7.

      The article presents its complex results in a well-structured and sensible tabular format. Figure 2 provides an example to illustrate how the results are displayed. Table 3 reports all detailed results, while Table 4 offers a summary, and Table 5 draws conclusions on which network performed best for each topic. The tables and their captions are extensive and may seem overwhelming at first glance. However, the article makes it clear that this level of detail is both intentional and necessary. The thorough descriptions guide the reader through the results and enhance comprehension. Rather than being a weakness, the comprehensive presentation reflects the authors’ careful and rigorous approach.

      Regarding additional strengths of the article, I would like to highlight and support those identified by the authors themselves. This study represents a clear advancement over the 2024 publication. By focusing on a single metric—purity, rather than also including inverse cluster number—the evaluation and interpretation of results have been significantly simplified, and comparability has improved. Whereas the earlier study only allowed comparisons between cluster solutions based on identical document sets and similar cluster sizes, the current study enables comparisons across different networks, even when they involve varying documents and cluster structures. A notable innovation in this article is the introduction of purity profiles, which effectively illustrate how clearly topic clusters would be perceived by users navigating the science map.

      In addition to highlighting the strengths of their work, the authors also acknowledge three key limitations. These include the absence of a specified minimum cluster size, the combination of bipartite and non-bipartite networks, and the potential inaccessibility of certain data sources for other researchers (e.g., due to paywalls such as those associated with the Twitter API). Each of these limitations is clearly presented and discussed in the article. The authors provide thoughtful reasoning on the impact of these constraints and explain how they have addressed them within the scope of their study.

      (3)    suggestions for improvements

      I have no suggestions for improving the article.

      (4)    data and code availability/ research ethics/ MetaROR policies

      The research data is available on Zenodo in accordance with the principle as open as possible, as closed as necessary. Due to legal restrictions, the raw data used in the experiments cannot be shared. However, the code used to conduct the experiments and generate the results is provided, along with a summary of the data utilized.<br /> This ensures transparency and allows others to understand the methodology and replicate the results, even in the absence of the original raw data.

    2. In this article, the authors present a study using different networks from various data sources to measure differences in gathering scholarly document topics and to show which networks provide the best information to represent the scientific topics considered appropriately. The work is built on a previous contribution and analyses networks obtained from six sources: scholarly document authors, Facebook users, Twitter users and conversations, patents, and policy documents. These networks are also accompanied by other networks, i.e. the text similarity network and the citation network, that are mainly used for comparison purposes.

      The work particularly interests the scholarly community, aiming to work with science map generation. However, some passages need further explanation to be clear to the reader.

      1. In the abstract, there is a mention of traditional and non-traditional data sources. While in the text of the article there are, indeed, some clarifications, it would be ideal to briefly explain in the abstract what the authors refer to these terms, since it is not immediately clear what is a traditional data source in the context of topic identification.

      2. In the introduction, the authors anticipate the outcomes of a previous work they have conducted on a similar topic. They claim that some topics are well-represented in maps based on citation links and text similarity, while others are not. However, it is not clear which sources they have used to get to this claim, and it is also not evident what the main difference is that characterises the current work compared to the previous one.

      3. In section 3, the authors introduce all the methods and materials used for their analysis. Despite the fact that some of the material cannot be shared since it is behind a paywall (e.g. the Web of Science data), by reading the section, it is not clear that all the code developed and the data obtained from the analysis have been published on Zenodo. While it is okay to address this aspect in the appropriate section at the end of the article, I would suggest to anticipate this information at the beginning of section 3, citing the Zenodo record appropriately and clarifying which of material is not included in that record, thus explaining that the full reproducibility of the experiment cannot be conducted.

      4. Considering all the external sources of networks, it is not clear what the datetime window of each source is - are all these sources containing information from the year of publication of the oldest article in the document set considered to 2024?

      5. As far as I understood from the formula in section 3.7.1, the Purity is always calculated against a particular topic M. Thus, why not refer to such "M" in the formula definition, defining it in a function-like way Purity(N, M)? In addition, still in this section, it is not clear how the N clusters considered are selected. A running example of Purity calculation would probably help the reader here.

      6. In section 3.7.2, the denominator of the formula is set to 5. However, it is unclear why such a number is sensitive for the calculation presented. Why not 6 or 7? Why not 3? I think the authors should clearly justify the choice of such a denominator by bringing in explicit evidence.

      7. In section 3.7.3, it is not entirely clear what the difference is between topics and topic categories.

      8. In the discussions, it would be good to extend a bit on the work's limitation and envision possible paths for future works in the area. A few points that I would love to see discussed in detail:

        • The analysis has been done by using sources that may have changed drastically in the past months/years - e.g. Twitter that, after becoming X, has seen a series of abandons from the academics towards more open (in a broad sense) platforms and networks (e.g. Mastodon and, more recently, BlueSky). Would it be possible to gather the necessary data from these platforms to run the study again? If yes, would it be possible to download them? If not, should we consider these sources unreliable for scientific purposes and, if so, what preconditions should be in place for their reliability? Considering the present situation, what is the relevance of the results obtained with the data gathered from Twitter (now X)?

        • The authors transparently claim that some of the data used (e.g. Web of Science data) are not freely available to the reader, thus preventing the full replication of the study. Is it possible to substitute these closed sources with others offering open research information? For instance, OpenCitations for gathering the citation network (full disclosure: I'm director of OpenCitations), PubMed and PubMed Central for gathering titles and abstracts of the article considered, etc.?

        • The core set of scholarly documents considered are primarily from the biomedical domain since the authors considered only those with a PubMed identifier specified. While the results shown are sensitive for this domain, how much does the approach the authors presented scale also in other scholarly areas, e.g. Social Science and Humanities? Is it possible to speculate that the approach presented is discipline-agnostic? Is there any evidence for such a claim?

      Some final remarks:

      A. The figures should be closer (i.e. maximum on the next page) to the place they are mentioned the very first time.

      B. The research question introduced in the article is introduced in section 1, and then it is not explicitly mentioned anymore in the text. It would be ideal to add an explicit reference to that question when the authors present appropriate evidence to answer it (e.g. in section 4) and to recall the answer to that question in the conclusion of the paper.

    1. al relationships that may arise while using social networking sites or other electronic media.

      This ethic code demonstrates the significance of honorable and sensitive use of technology in social work. Although in the hours of practice or career, using technology can be extremely useful for patients and for work demands, beyond work, there is a standard for the use of technology. This correlates to what gets posted, how one portrays themselves on social media, how recognizable the social worker is and risk a client and social worker relationship. As I approach the social work career, I will continue to use social media respectfully, graciously and private. Ensuring that I am separated from work and upholding standards I am confident that with careful awareness it is achievable.

    1. Glossary: L2 & L3 have no clearly defined meaning. They represent that we may observe multiple layers of higher order code when automating. Examples could be bash scripts, configuration data, platform utility code and more. Standard's Components and their Value Contribution
    1. equences of these basic instruction

      With a sequence of instructions, the code will be able to run the code and create an output, because it's using conditional execution and repetition, since some of these sequences may be repeated

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      We thank the reviewer for his valuable input and careful assessment, which have significantly improved the clarity and rigor of our manuscript.

      Summary:

      Mazer & Yovel 2025 dissect the inverse problem of how echolocators in groups manage to navigate their surroundings despite intense jamming using computational simulations.

      The authors show that despite the 'noisy' sensory environments that echolocating groups present, agents can still access some amount of echo-related information and use it to navigate their local environment. It is known that echolocating bats have strong small and large-scale spatial memory that plays an important role for individuals. The results from this paper also point to the potential importance of an even lower-level, short-term role of memory in the form of echo 'integration' across multiple calls, despite the unpredictability of echo detection in groups. The paper generates a useful basis to think about the mechanisms in echolocating groups for experimental investigations too.

      Strengths:

      (1) The paper builds on biologically well-motivated and parametrised 2D acoustics and sensory simulation setup to investigate the various key parameters of interest

      (2) The 'null-model' of echolocators not being able to tell apart objects & conspecifics while echolocating still shows agents successfully emerge from groups - even though the probability of emergence drops severely in comparison to cognitively more 'capable' agents. This is nonetheless an important result showing the directionof-arrival of a sound itself is the 'minimum' set of ingredients needed for echolocators navigating their environment.

      (3) The results generate an important basis in unraveling how agents may navigate in sensorially noisy environments with a lot of irrelevant and very few relevant cues.

      (4) The 2D simulation framework is simple and computationally tractable enough to perform multiple runs to investigate many variables - while also remaining true to the aim of the investigation.

      Weaknesses:

      There are a few places in the paper that can be misunderstood or don't provide complete details. Here is a selection:

      (1) Line 61: '... studies have focused on movement algorithms while overlooking the sensory challenges involved' : This statement does not match the recent state of the literature. While the previous models may have had the assumption that all neighbours can be detected, there are models that specifically study the role of limited interaction arising from a potential inability to track all neighbours due to occlusion, and the effect of responding to only one/few neighbours at a time e.g. Bode et al. 2011 R. Soc. Interface, Rosenthal et al. 2015 PNAS, Jhawar et al. 2020 Nature Physics.

      We appreciate the reviewer's comment and the relevant references. We have revised the manuscript accordingly to clarify the distinction between studies that incorporate limited interactions and those that explicitly analyze sensory constraints and interference. We have refined our statement to acknowledge these contributions while maintaining our focus on sensory challenges beyond limited neighbor detection, such as signal degradation, occlusion effects, and multimodal sensory integration (see lines 58-64):

      (2) The word 'interference' is used loosely places (Line 89: '...took all interference signals...', Line 319: 'spatial interference') - this is confusing as it is not clear whether the authors refer to interference in the physics/acoustics sense, or broadly speaking as a synonym for reflections and/or jamming.

      To improve clarity, we have revised the manuscript to distinguish between different types of interference:

      • Acoustic interference (jamming): Overlapping calls that completely obscure echo detection, preventing bats from perceiving necessary environmental cues.

      • Acoustic interference (masking): Partial reduction in signal clarity due to competing calls.

      • Spatial interference: Physical obstruction by conspecifics affecting movement and navigation.

      We have updated the manuscript to use these terms consistently and explicitly define them in relevant sections (see lines 84-85, 119-120). This distinction ensures that the reader can differentiate between interference as an acoustic phenomenon and its broader implications in navigation.

      (3) The paper discusses original results without reference to how they were obtained or what was done. The lack of detail here must be considered while interpreting the Discussion e.g. Line 302 ('our model suggests...increasing the call-rate..' - no clear mention of how/where call-rate was varied) & Line 323 '..no benefit beyond a certain level..' - also no clear mention of how/where call-level was manipulated in the simulations.

      All tested parameters, including call rate dynamics and call intensity variations, are detailed in the Methods section and Tables 1 and 2. Specifically:

      • Call Rate Variation: The Inter-Pulse Interval (IPI) was modeled based on documented echolocation behavior, decreasing from 100 msec during the search phase to 35 msec (~28 calls per second) at the end of the approach phase, and to 5 msec (200 calls per second) during the final buzz (see Table 2). This natural variation in call rate was not manually manipulated in the model but emerged from the simulated bat behavior.

      • Call Intensity Variation: The tested call intensity levels (100, 110, 120, 130 dB SPL) are presented in Table 1 under the “Call Level” parameter. The effect of increasing call intensity was analyzed in relation to exit probability, jamming probability, and collision rate. This is now explicitly referenced in the Discussion. We have revised the manuscript to explicitly reference these aspects in the Results and Discussion sections – see lines 346-349, 372-375.

      Reviewer #2 (Public review):

      We are grateful for the reviewer’s insightful feedback, which has helped us clarify key aspects of our research and strengthen our conclusions.

      This manuscript describes a detailed model of bats flying together through a fixed geometry. The model considers elements that are faithful to both bat biosonar production and reception and the acoustics governing how sound moves in the air and interacts with obstacles. The model also incorporates behavioral patterns observed in bats, like one-dimensional feature following and temporal integration of cognitive maps. From a simulation study of the model and comparison of the results with the literature, the authors gain insight into how often bats may experience destructive interference of their acoustic signals and those of their peers, and how much such interference may actually negatively affect the groups' ability to navigate effectively. The authors use generalized linear models to test the significance of the effects they observe.

      In terms of its strengths, the work relies on a thoughtful and detailed model that faithfully incorporates salient features, such as acoustic elements like the filter for a biological receiver and temporal aggregation as a kind of memory in the system. At the same time, the authors' abstract features are complicating without being expected to give additional insights, as can be seen in the choice of a twodimensional rather than three-dimensional system. I thought that the level of abstraction in the model was perfect, enough to demonstrate their results without needless details. The results are compelling and interesting, and the authors do a great job discussing them in the context of the biological literature. 

      The most notable weakness I found in this work was that some aspects of the model were not entirely clear to me. 

      For example, the directionality of the bat's sonar call in relation to its velocity. Are these the same?

      For simplicity, in our model, the head is aligned with the body, therefore the direction of the echolocation beam is the same as the direction of the flight. 

      Moreover, call directionality (directivity) is not directly influenced by velocity. Instead, directionality is estimated using the piston model, as described in the Methods section. The directionality is based on the emission frequency and is thus primarily linked to the behavioral phases of the bat, with frequency shifts occurring as the bat transitions from search to approach to buzz phases. During the approach phase, the bat emits calls with higher frequencies, resulting in increased directionality. This is supported by the literature (Jakobsen and Surlykke, 2010; Jakobsen, Brinkløv and Surlykke, 2013). This phase is also associated with a natural reduction in flight speed, which is a well-documented behavioral adaptation in echolocating bats(Jakobsen et al., 2024).

      To clarify this in the manuscript, we have updated the text to explicitly state that directionality follows phase-dependent frequency changes rather than being a direct function of velocity, see lines 543-545. 

      If so, what is the difference between phi_target and phi_tx in the model equations? 

      𝝓<sub>𝒕𝒂𝒓𝒈𝒆𝒕</sub> represents the angle between the bat and the reflected object (target).

      𝝓<sub>𝑻𝒙</sub> the angle [rad], between the masking bat and target (from the transmitter’s perspective)

      𝝓<sub>𝑻𝒙𝑹𝒙</sub> refers to the angle between the transmitting conspecific and the receiving focal bat, from the transmitter’s point of view.

      𝝓<sub>𝑹𝒙𝑻𝒙</sub> represents the angle between the receiving bat and the transmitting bat, from the receiver’s point of view.

      These definitions have been explicitly stated in the revised manuscript to prevent any ambiguity (lines 525-530). Additionally, a Supplementary figure demonstrating the geometrical relations has been added to the manuscript.

      What is a bat's response to colliding with a conspecific (rather than a wall)? 

      In nature, minor collisions between bats are common and typically do not result in significant disruptions to flight (Boerma et al., 2019; Roy et al., 2019; Goldshtein et al., 2025). Given this, our model does not explicitly simulate the physical impact of a collision event. Instead, during the collision event the bat keeps decreasing its velocity and changing its flight direction until the distance between bats is above the threshold (0.4 m). We assume that the primary cost of such interactions arises from the effort required to avoid collisions, rather than from the collision itself. This assumption aligns with observations of bat behavior in dense flight environments, where individuals prioritize collision avoidance rather than modeling post-collision dynamics. See lines 479-484.

      From the statistical side, it was not clear if replicate simulations were performed. If they were, which I believe is the right way due to stochasticity in the model, how many replicates were used, and are the standard errors referred to throughout the paper between individuals in the same simulation or between independent simulations, or both? 

      The number of repetitions for each scenario is detailed in Table 1, but we included it in a more prominent location in the text for clarity. Specifically, we now state (Lines 110-111):

      "The number of repetitions for each scenario was as follows: 1 bat: 240; 2 bats: 120; 5 bats: 48; 10 bats: 24; 20 bats: 12; 40 bats: 12; 100 bats: 6."

      Regarding the reported standard errors, they are calculated across all individuals within each scenario, without distinguishing between different simulation trials. 

      We clarified in the revised text (Lines 627-628 in Statistical Analysis) 

      Overall, I found these weaknesses to be superficial and easily remedied by the authors. The authors presented well-reasoned arguments that were supported by their results, and which were used to demonstrate how call interference impacts the collective's roost exit as measured by several variables. As the authors highlight, I think this work is valuable to individuals interested in bat biology and behavior, as well as to applications in engineered multi-agent systems like robotic swarms.

      Reviewer #3 (Public review):

      We sincerely appreciate the reviewer’s thoughtful comments and the time invested in evaluating our work, which have greatly contributed to refining our study.

      We would like to note that in general, our model often simplifies some of the bats’ abilities, under the assumption that if the simulated bats manage to perform this difficult task with simpler mechanisms, real better adapted bats will probably perform even better. This thought strategy will be repeated in several of the s below.

      Summary:

      The authors describe a model to mimic bat echolocation behavior and flight under high-density conditions and conclude that the problem of acoustic jamming is less severe than previously thought, conflating the success of their simulations (as described in the manuscript) with hard evidence for what real bats are actually doing. The authors base their model on two species of bats that fly at "high densities" (defined by the authors as colony sizes from tens to tens of thousands of individuals and densities of up to 33.3 bats/m2), Pipistrellus kuhli and Rhinopoma microphyllum. This work fits into the broader discussion of bat sensorimotor strategies during collective flight, and simulations are important to try to understand bat behavior, especially given a lack of empirical data. However, I have major concerns about the assumptions of the parameters used for the simulation, which significantly impact both the results of the simulation and the conclusions that can be made from the data. These details are elaborated upon below, along with key recommendations the authors should consider to guide the refinement of the model.

      Strengths:

      This paper carries out a simulation of bat behavior in dense swarms as a way to explain how jamming does not pose a problem in dense groups. Simulations are important when we lack empirical data. The simulation aims to model two different species with different echolocation signals, which is very important when trying to model echolocation behavior. The analyses are fairly systematic in testing all ranges of parameters used and discussing the differential results.

      Weaknesses:

      The justification for how the different foraging phase call types were chosen for different object detection distances in the simulation is unclear. Do these distances match those recorded from empirical studies, and if so, are they identical for both species used in the simulation? 

      The distances at which bats transition between echolocation phases are identical for both species in our model (see Table 2). These distances are based on welldocumented empirical studies of bat hunting and obstacle avoidance behavior (Griffin, Webster and Michael, 1958; Simmons and Kick, 1983; Schnitzler et al., 1987; Kalko, 1995; Hiryu et al., 2008; Vanderelst and Peremans, 2018). These references provide extensive evidence that insectivorous bats systematically adjust their echolocation calls in response to object proximity, following the characteristic phases of search, approach, and buzz.

      To improve clarity, we have updated the text to explicitly state that the phase transition distances are empirically grounded and apply equally to both modeled species (lines 499-508).

      What reasoning do the authors have for a bat using the same call characteristics to detect a cave wall as they would for detecting a small insect? 

      In echolocating bats, call parameters are primarily shaped by the target distance and echo strength. Accordingly, there is little difference in call structure between prey capture and obstacles-related maneuvers, aside from intensity adjustments based on target strength (Hagino et al., 2007; Hiryu et al., 2008; Surlykke, Ghose and Moss, 2009; Kothari et al., 2014). In our study, due to the dense cave environment, the bats are found to operate in the approach phase most of the time, which is consistent with natural cave emergence, where they are navigating through a cluttered environment rather than engaging in open-space search. For one of the species (Rhinopoma), we also have empirical recordings of individuals flying under similar conditions (Goldshtein et al., 2025). Our model was designed to remain as simple as possible while relying on conservative assumptions that may underestimate bat performance. If, in reality, bats fine-tune their echolocation calls even earlier or more precisely during navigation than assumed, our model would still conservatively reflect their actual capabilities. See lines 500-508.

      The two species modeled have different calls. In particular, the bandwidth varies by a factor of 10, meaning the species' sonars will have different spatial resolutions. Range resolution is about 10x better for PK compared to RM, but the authors appear to use the same thresholds for "correct detection" for both, which doesn't seem appropriate.

      The detection process in our model is based on Saillant’s method using a filterbank, as detailed in the paper (Saillant et al., 1993; Neretti et al., 2003; Sanderson et al., 2003). This approach inherently incorporates the advantages of a wider bandwidth, meaning that the differences in range resolution between the species are already accounted for within the signal-processing framework. Thus, there is no need to explicitly adjust the model parameters for bandwidth variations, as these effects emerge from the applied method.

      Also, the authors did not mention incorporating/correcting for/exploiting Doppler, which leads me to assume they did not model it.

      The reviewer is correct. To maintain model simplicity, we did not incorporate the Doppler effect or its impact on echolocation. The exclusion of Doppler effects was based on the assumption that while Doppler shifts can influence frequency perception, their impact on jamming and overall navigation performance is minor within the modelled context.

      The maximal Doppler shifts expected for the bats in this scenario are of ~ 1kHz. These shifts would be applied variably across signals due to the semi-random relative velocities between bats, leading to a mixed effect on frequency changes. This variability would likely result in an overall reduction in jamming rather than exacerbating it, aligning with our previous statement that our model may overestimate the severity of acoustic interference. Such Doppler shifts would result in errors of 2-4 cm in localization (i.e., 200-400 micro-seconds) (Boonman, Parsons and Jones, 2003).

      We have now explicitly highlighted this in the revised version (see 548-581).

      The success of the simulation may very well be due to variation in the calls of the bats, which ironically enough demonstrates the importance of a jamming avoidance response in dense flight. This explains why the performance of the simulation falls when bats are not able to distinguish their own echoes from other signals. For example, in Figure C2, there are calls that are labeled as conspecific calls and have markedly shorter durations and wider bandwidths than others. These three phases for call types used by the authors may be responsible for some (or most) of the performance of the model since the correlation between different call types is unlikely to exceed the detection threshold. But it turns out this variation in and of itself is what a jamming avoidance response may consist of. So, in essence, the authors are incorporating a jamming avoidance response into their simulation. 

      We fully agree that the natural variations in call design between the phases contribute significantly to interference reduction (see our discussion in a previous paper in Mazar & Yovel, 2020). However, we emphasize that this cannot be classified as a Jamming Avoidance Response (JAR). In our model, bats respond only to the physical presence of objects and not to the acoustic environment or interference itself. There is no active or adaptive adjustment of call design to minimize jamming beyond the natural phase-dependent variations in call structure. Therefore, while variation in call types does inherently reduce interference, this effect emerges passively from the modeled behavior rather than as an intentional strategy to avoid jamming. 

      The authors claim that integration over multiple pings (though I was not able to determine the specifics of this integration algorithm) reduces the masking problem. Indeed, it should: if you have two chances at detection, you've effectively increased your SNR by 3dB.  

      The reviewer is correct. Indeed, integration over multiple calls improves signal-tonoise ratio (SNR), effectively increasing it by approximately 3 dB per doubling of observations. The specifics of the integration algorithm are detailed in the Methods section, where we describe how sensory information is aggregated across multiple time steps to enhance detection reliability.

      They also claim - although it is almost an afterthought - that integration dramatically reduces the degradation caused by false echoes. This also makes sense: from one ping to the next, the bat's own echo delays will correlate extremely well with the bat's flight path. Echo delays due to conspecifics will jump around kind of randomly. However, the main concern is regarding the time interval and number of pings of the integration, especially in the context of the bat's flight speed. The authors say that a 1s integration interval (5-10 pings) dramatically reduces jamming probability and echo confusion. This number of pings isn't very high, and it occurs over a time interval during which the bat has moved 5-10m. This distance is large compared to the 0.4m distance-to-obstacle that triggers an evasive maneuver from the bat, so integration should produce a latency in navigation that significantly hinders the ability to avoid obstacles. Can the authors provide statistics that describe this latency, and discussion about why it doesn't seem to be a problem? 

      As described in the Methods section, the bat’s collision avoidance response does not solely rely on the integration process. Instead, the model incorporates real-time echoes from the last calls, which are used independently of the integration process for immediate obstacle avoidance maneuvers. This ensures that bats can react to nearby obstacles without being hindered by the integration latency. The slower integration on the other hand is used for clustering, outlier removal and estimation wall directions to support the pathfinding process, as illustrated in Supplementary Figure 1.

      Additionally, our model assumes that bats store the physical positions of echoes in an allocentric coordinate system (x-y). The integration occurs after transforming these detections from a local relative reference frame to a global spatial representation. This allows for stable environmental mapping while maintaining responsiveness to immediate changes in the bat’s surroundings.

      See lines 600-616 in the revised version.

      The authors are using a 2D simulation, but this very much simplifies the challenge of a 3D navigation task, and there is an explanation as to why this is appropriate. Bat densities and bat behavior are discussed per unit area when realistically it should be per unit volume. In fact, the authors reference studies to justify the densities used in the simulation, but these studies were done in a 3D world. If the authors have justification for why it is realistic to model a 3D world in a 2D simulation, I encourage them to provide references justifying this approach. 

      We acknowledge that this is a simplification; however, from an echolocation perspective, a 2D framework represents a worst-case scenario in terms of bat densities and maneuverability:

      • Higher Effective Density: A 2D model forces all bats into a single plane rather than distributing them through a 3D volume, increasing the likelihood of overlap in calls and echoes and making jamming more severe. As described in the text: the average distance to the nearest bat in our simulation is 0.27m (with 100 bats), whereas reported distances in very dense colonies are 0.5m (Fujioka et al., 2021), as observed in Myotis grisescens (Sabol and Hudson, 1995) and Tadarida brasiliensis (Theriault et al., no date; Betke et al., 2008; Gillam et al., 2010)

      • Reduced Maneuverability: In 3D space, bats can use vertical movement to avoid obstacles and conspecifics. A 2D constraint eliminates this degree of freedom, increasing collision risk and limiting escape options.

      Thus, our 2D model provides a conservative difficult test case, ensuring that our findings are valid under conditions where jamming and collision risks are maximized. Additionally, the 2D framework is computationally efficient, allowing us to perform multiple simulation runs to explore a broad parameter space and systematically test the impact of different variables.

      To address the reviewer’s concern, we have clarified this justification in the revised text and will provide supporting references where applicable (see Methods lines 450455).

      The focus on "masking" (which appears to be just in-band noise), especially relative to the problem of misassigned echoes, is concerning. If the bat calls are all the same waveform (downsweep linear FM of some duration, I assume - it's not clear from the text), false echoes would be a major problem. Masking, as the authors define it, just reduces SNR. This reduction is something like sqrt(N), where N is the number of conspecifics whose echoes are audible to the bat, so this allows the detection threshold to be set lower, increasing the probability that a bat's echo will exceed a detection threshold. False echoes present a very different problem. They do not reduce SNR per se, but rather they cause spurious threshold excursions (N of them!) that the bat cannot help but interpret as obstacle detection. I would argue that in dense groups the mis-assignment problem is much more important than the SNR problem. 

      There is substantial literature supporting the assumption that bats can recognize their own echoes and distinguish them from conspecific signals (Schnitzler, Bioscience and 2001, no date; Kazial, Burnett and Masters, 2001; Burnett and Masters, 2002; Kazial, Kenny and Burnett, 2008; Chili, Xian and Moss, 2009; Yovel et al., 2009; Beetz and Hechavarría, 2022)). However, we acknowledge that false echoes may present a major challenge in dense groups. To address this, we explicitly tested the impact of the self-echo identification assumption in our study see Results Figure 1: The impact of confusion on performance, and lines 399-404 in the Discussion.

      Furthermore, we examined a full confusion scenario, where all reflected echoes from conspecifics were misinterpreted as obstacle reflections (i.e., 100% confusion). Our results show that this significantly degrades navigation performance, supporting the argument that echo misassignment is a critical issue. However, we also explored a simple mitigation strategy based on temporal integration with outlier rejection, which provided some improvement in performance. This suggests that real bats may possess additional mechanisms to enhance self-echo identification and reduce false detections. See lines 411-420 in the manuscript for further discussion. 

      We actually used logarithmically frequency modulated (FM) chirps, generated using the MATLAB built-in function chirp(t, f0, t1, f1, 'logarithmic'). This method aligns with the nonlinear FM characteristics of Pipistrellus kuhlii (PK) and Rhinopoma microphyllum (RM) and provides a realistic approximation of their echolocation signals. We acknowledge that this was not sufficiently emphasized in the original text, and we have now explicitly highlighted this in the revised version to ensure clarity (see Lines 509-512 in Methods).

      The criteria set for flight behavior (lines 393-406) are not justified with any empirical evidence of the flight behavior of wild bats in collective flight. How did the authors determine the avoidance distances? Also, what is the justification for the time limit of 15 seconds to emerge from the opening? Instead of an exit probability, why not instead use a time criterion, similar to "How long does it take X% of bats to exit?"  :

      While we acknowledge that wild bats may employ more complex behaviors for collision avoidance, we chose to implement a simplified decision-making rule in our model to maintain computational tractability.

      The avoidance distances (1.5 m from walls and 0.4 m from other bats) were selected as internal parameters to support stable and realistic flight trajectories while maintaining a reasonable collision rate. These values reflect a trade-off between maneuverability and behavioral coherence under crowding. To address this point, we added a sensitivity analysis to the revised manuscript. Specifically, we tested the effect of varying the conspecific avoidance distance from 0.2 to 1.6 meters at bat densities of 2 to 40 bats/3m². The only statistically significant impact was at the highest density (40 bats/3m²), where exit probability increased slightly from 82% to 88% (p = 0.024, t = 2.25, DF = 958). No significant changes were observed in exit time, collision rate, or jamming probability across other densities or conditions (GLM, see revised Methods). These results suggest that the selected avoidance distances are robust and not a major driver of model performance, see lines 469-47.

      The 15-second exit limit was determined as described in the text (Lines 489-491): “A 15-second window was chosen because it is approximately twice the average exit time for 40 bats and allows for a second corrective maneuver if needed.” In other words, it allowed each bat to circle the ‘cave’ twice to exit even in the most crowded environment. This threshold was set to keep simulation time reasonable while allowing sufficient time for most bats to exit successfully.

      We acknowledge that the alternative approach suggested by the reviewer— measuring the time taken for a certain percentage of bats to exit—is also valid. However, in our model, some outlier bats fail to exit and continue flying for many minutes, such simulations would lead to excessive simulation times making it difficult to generate repetitions and not teaching us much – they usually resulted from the bat slightly missing the opening (see video S1. Our chosen approach ensures practical runtime constraints while still capturing relevant performance metrics.

      What is the empirical justification for the 1-10 calls used for integration?  

      The "average exit time for 40 bats" is also confusing and not well explained. Was this determined empirically? From the simulation? If the latter, what are the conditions?

      Does it include masking, no masking, or which species? 

      Previous studies have demonstrated that bats integrate acoustic information received sequentially over several echolocation calls (2-15), effectively constructing an auditory scene in complex environments (Ulanovsky and Moss, 2008; Chili, Xian and Moss, 2009; Moss and Surlykke, 2010; Yovel and Ulanovsky, 2017; Salles, Diebold and Moss, 2020). Additionally, bats are known to produce echolocation sound groups when spatiotemporal localization demands are high (Kothari et al., 2014). Studies have documented call sequences ranging from 2 to 15 grouped calls (Moss and Surlykke, 2010), and it has been hypothesized that grouping facilitates echo segregation.

      We did not use a single integration window - we tested integration sizes between 1 and 10 calls and presented the results in Figure 3A. This range was chosen based on prior empirical findings and to explore how different levels of temporal aggregation impact navigation performance. Indeed, the results showed that the performance levels between 5-10 calls integration window (Figure 3A)

      Regarding the average exit time for 40 bats, this value was determined from our simulations, where it represents the mean time for successful exits under standard conditions with masking. We have revised the text to clarify these details see, lines 489-491.

      Reviewer #1 (Recommendations for the authors):

      (1) Data Availability:

      As it stands now, this reviewer cannot vouch for the uploaded code as it wasn't accessible according to F.A.I.R principles. The link to the code/data points to a private company's file-hosting account that requires logging in or account creation to see its contents, and thus cannot be accessed.

      This reviewer urges the authors to consider uploading the code onto an academic data repository from the many on offer (e.g. Dryad, Zenodo, OSF). Some repositories offer an option to share a private link (e.g. Zenodo) to the folder that can then be shared only with reviewers so it is not completely public.

      This is a computational paper, and the credibility of the results is based on the code used to generate them.

      The code is available at GitHub as required:

      https://github.com/omermazar/Colony-Exit-Bat-Simulation

      (2) Abstract:

      Line 22: 'To explore whether..' - replace 'whether' with 'how'?

      The sentence was rephrased as suggested by the reviewer.

      (2) Main text:

      Line 43: '...which may share...' - correct to '...which share...', as elegantly framed in the authors' previous work - jamming avoidance is unavoidable because all FM bats of a species still share >90% of spectral bandwidth despite a few kHz shift here and there.

      The sentence was rephrased as suggested by the reviewer.

      Line 49: The authors may wish to additionally cite the work of Fawcett et al. 2015 (J. Comp. Phys A & Biology Open)

      Thank you for the suggestion. We have included a citation to the work of Fawcett et al. (2015) in the revised manuscript.

      Line 61: This statement does not match the recent state of the literature. While the previous models may have assumed that all neighbours can be detected, there are models that specifically study the role of limited interaction arising from the potential inability to track all neighbours, and the effect of responding to only one/few neighbours at a time e.g. Bode et al. 2011 R. Soc. Interface, Jhawar et al. 2020 Nature Physics.

      We have added citations to the important studies suggested by the reviewer, as detailed in the Public Review above.

      Line 89: '..took all interference signals into account...' - what is meant by 'interference signals' - are the authors referring to reflections, unclear.

      We have revised the sentence and detailed the acoustic signals involved in the process: self-generated echoes, calls from conspecifics, and echoes from cave walls and other bats evoked by those calls, see lines 99-106.

      Figure 1A: The colour scheme with overlapping points makes the figure very hard to understand what is happening. The legend has colours from subfigures B-D, adding to the confusion.

      What does the yellow colour represent? This is not clear. Also, in general, the color schemes in the simulation trajectories and the legend are not the same, creating some amount of confusion for the reader. It would be good to make the colour schemes consistent and visually separable (e.g. consp. call direct is very similar to consp. echo from consp. call), and perhaps also if possible add a higher resolution simulation visualisation. Maybe it is best to separate out the colour legends for each sub-figure.

      The updated figure now includes clearer, more visually separable colors, and consistent color coding across all sub-panels. The yellow trajectory representing the focal bat’s flight path is now explicitly labeled, and we adjusted the color mapping of acoustic signals (e.g., conspecific calls vs. echoes) to improve distinction. We also revised the figure caption accordingly and ensured that the legend is aligned with the updated visuals. These modifications aim to enhance interpretability and reduce ambiguity for the reader.

      Figure C3: What is 'FB Channel', this is not explained in the legend.

      FB Channel’ stands for ‘Filter Bank Channel’. This clarification has been added to the caption of Figure 1. 

      Figure 3: Visually noticing that the colour legend is placed only on sub-figure A is tricky and readers may be left searching for the colour legend. Maybe lay out the legend horizontally on top of the entire figure, so it stands out?

      We have adjusted the placement of the color legend in Figure 3 to improve visibility and consistency.

      Line 141: '..the probability of exiting..' - how is this probability calculated - not clear.

      We have clarified in the revised text that the probability of exiting the cave within 15 seconds is defined as the number of bats that exited the cave within that time divided by the total number of bats in each scenario, see lines 159160.

      Line 142: What are the sample sizes here - i.e. how many simulation replicates were performed?

      We have clarified the number of repetitions in each scenario the revised text, as detailed in the Public Review above.

      Line 151: 'The jamming probability,...number of jammed echoes divided by the total number of reflected echoes' - it seems like these are referring to 'own' echoes or first-order reflections, it is important to clarify this.

      The reviewer is right. We have clarified it in the revised text, see lines 173175.

      Line 153: '..with a maximum difference of ...' - how is this difference calculated? What two quantities are being compared - not clear.

      We have revised the text to clarify that the 14.3% value reflects the maximum difference in jamming probability between the RM and PK models, which occurred at a density of 10 bats. The values at each density are shown in Figure 2D, see lines 175-177.

      Line 221: '..temporal aggregation helps..' - I'm assuming the authors meant temporal integration? However, I would caution against using the exact term 'temporal integration' as it is used in the field of audition to mean something different. Perhaps something like 'sensory integration' , or 'multi-call integration'

      To avoid ambiguity and better reflect the process modeled in our work, we have replaced the term "temporal aggregation" with "multi-call integration" throughout the revised manuscript. This term more accurately conveys the idea of combining information from multiple echolocation calls without conflicting with existing terminology.

      (4) Discussion

      Lines 302: 'Our model suggests...increasing the call-rate..' - not clear where this is explicitly tested or referred to in this manuscript. Can't see what was done to measure/quantify the effect of this variable in the Methods or anywhere else.

      We have rephrased this paragraph as detailed in the Public Review above, see lines 346-349.

      Line 319: 'spatial interference' - unclear what this means. This reviewer would strongly caution against creating new terms unless there is an absolute need for it. What is meant by 'interference' in this paper is hard to assess given that the word seems to be used as a synonym for jamming and also for actual physical wave-based interference.

      We have rephrased this paragraph as detailed in the Public Review above, see line 119-120, 366-367.

      Line 323: '..no benefit beyond a certain level...' - also not clear where this is explicitly tested. It seems like there was a set of simulations run for a variety of parameters but this is not written anywhere explicitly. What type of parameter search was done, was it all possible parameter combinations - or only a subset? This is not clear.

      We have rephrased this paragraph as detailed in the Public Review above, see lines 372-375.

      Line 324: '..ca. 110 dB-SPL.' - what reference distance?

      All call levels were simulated and reported in dB-SPL, referenced at 0.1 meters from the emitting bat. We have clarified it in the revised text in the relevant contexts and specifically in line 529.

      (5) Methods

      Line 389 : '...over a 2 x 1.5 m2 area..' It took a while to understand this statement and put it in context. Since there is no previous description of the entire L-arena, the reviewer took it to mean the simulations happened over the space of a 2 x 1.5 m2 area. Include a top-down description of the simulation's spatial setup and rephrase this sentence.

      To address the confusion, we revised the text to clarify that the full simulation environment represents a corridor-shaped cave measuring 14.5 × 2.5 meters, with a right-angle turn located 5.5 meters before the exit, as shown in Figure 1A. The 2 × 1.5 m area refers specifically to the small zone at the far end of the cave where bats begin their flight. The revised description now includes a clearer spatial overview to prevent ambiguity, see lines 456-460.

      Line 398: Replace 'High proximity' with 'Close proximity'

      Replaced.

      Line 427: 'uniform target strength of -23 dB' - at what distance is this target strength defined? Given the reference distance can vary by echolocation convention (0.1 or 1 m), one can't assess if this is a reasonable value or not.

      The reference distance for the reported target strength is 1 meter, in line with standard acoustic conventions. We have revised the text to clarify this explicitly (line 531).

      Also, independent of the reference distance, particularly with reference to bats, the target strength is geometry-dependent, based on whether the wings are open or not. Using the entire wingspan of a bat to parametrise the target strength is an overestimate of the available reflective area. The effective reflective area is likely to be somewhere closer to the surface area of the body and a fraction of the wingspan together. This is important to note and/or mention explicitly since the value is not experimentally parametrised.

      For comparison, experimentally based measurements used in Goetze et al. 2016 are -40 dB (presumably at 1 m since the source level is also defined at 1 m?), and Beleyur & Goerlitz 2019 show a range between -43 to -34 dB at 1 m.

      We agree with the reviewer that target strength in bats is strongly influenced by their geometry, particularly wing posture during flight. In our model, we simplified this aspect by using a constant target strength, as the detailed temporal variation in body and wing geometry is pseudo-random and not explicitly modeled. We acknowledge that this is a simplification, and have now stated this limitation clearly in the revised manuscript. We chose a fixed value of –23 dB at 1 meter to reflect a plausible mid-range estimate, informed by anatomical data and consistent with values reported for similarly sized species (Beleyur and Goerlitz, 2019). To support this, we directly measured the target strength of a 3D-printed RM bat model, obtaining –32dB. 

      Moreover, a sensitivity analysis across a wide range (–49 to –23 dB) confirmed that performance metrics remain largely stable, indicating that our conclusions are not sensitive to this parameter, and suggesting that our results hold for different-sized bats. See lines 384-390, 533-538, and Supplementary Figures 3 and 4 in the revised article. 

      Line 434: 'To model the bat's cochlea...'. Bats have two cochleas. This model only describes one, while the agents are also endowed with the ability to detect sound direction - which requires two ears/cochleas.... There is missing information about the steps in between that needs to be provided.

      We appreciate the reviewer’s observation. Indeed, our model is monaural, and simulates detection using a single cochlear-like filter bank receiver. We have clarified this in the revised text to avoid confusion. This paragraph specifically describes the detection stage of the auditory processing pipeline. The localization process, which builds on detection and includes directional estimation, is described in the following paragraph (see line 583 onward), as discussed in the next comment and response.

      Line 457: 'After detection, the bat estimates the range and Direction of Arrival...' This paragraph describes the overall idea, but not the implementation. What were the inputs and outputs for the range and DOA calculation performed by the agent? Or was this information 'fed' in by the simulation framework? If there was no explicit DOA step that the agent performed, but it was assumed that agents can detect DOA, then this needs to be stated.

      In the current simulation, the Direction of Arrival (DOA) was not modeled via an explicit binaural processing mechanism. Instead, based on experimental studies (Simmons et al., 1983; Popper and Fay, 1995).  we assumed that bats can estimate the direction of an echo with an angular error that depends on the signal-to-noise ratio (SNR). Accordingly, the inputs to the DOA estimation were the peak level of the desired echo, noise level, and the level of acoustic interference. The output was an estimated direction of arrival that included a random angular error, drawn from a normal distribution whose standard deviation varied with the SNR. We have revised the relevant paragraph (Lines 583-592) to clarify this implementation.

      Line 464: 'To evaluate the impact of the assumption...' - the 'self' and 'non-self' echoes can be distinguished perhaps using pragmatic time-delay cues, but also using spectro-temporal differences in individual calls/echoes. Do the agents have individual call structures, or do all the agents have the same call 'shape'? The echolocation parameters for the two modelled species are given, but whether there is call parameter variation implemented in the agents is not mentioned.

      In our relatively simple model, all individuals emit the same type of chirp call, with parameters adapted only based on the distance to the nearest detected object. However, individual variation is introduced by assigning each bat a terminal frequency drawn from a normal distribution with a standard deviation of 1 kHz, as described in the revised version -lines 519-520. This small variation is not used explicitly as a spectro-temporal cue for echo discrimination.

      In our model, all spectro-temporal variations—whether due to call structure or variations resulting from overlapping echoes from nearby reflectors—are processed through the filter bank, which compares the received echoes to the transmitted call during the detection stage. As such, the detection process itself can act as a discriminative filter, to some extent, based on similarity to the emitted call.

      We acknowledge that real bats likely rely on a variety of spectro-temporal features for distinguishing self from non-self-echoes—such as call duration, received level, multi-harmonic structure, or amplitude modulation. In our simulation, we focus on comparing two limiting conditions: full recognition of self-generated echoes versus full confusion. Implementing a more nuanced self-recognition mechanism based on temporal or spectral cues would be a valuable extension for future work.

      (6) References

      Reference 22: Formatting error - and extra '4' in the reference.

      The error has been fixed.

      (7) Thoughts/comments

      Even without 'recogntion' of walls & conspecifics, bats may be able to avoid obstacles - this is a neat result. Also, using their framework the authors show that successful 'blind' object-agnostic obstacle avoidance can occur only when supported by some sort of memory. In some sense, this is a nice intermediate step showing the role of memory in bat navigation. We know that bats have good long-term and long-spatial scale memory, and here the authors show that short-term spatial memory is important in situations where immediate sensory information is unreliable or unavailable.

      We appreciate the reviewer’s thoughtful summary. Indeed, one of the main takeaways of our study is that successful obstacle avoidance can occur even without explicit recognition of walls or conspecifics—provided that a clustered multi-call integration is in place. Our model shows that when immediate sensory information is unreliable, integrating detections over time becomes essential for effective navigation. This supports the broader view that memory, even on short timescales, plays an important role in bat behavior.

      (8) Reporting GLM results

      The p-value, t-statistic, and degrees of freedom are reported consistently across multiple GLM results. However, the most important part which is the effect size is not consistently reported - and this needs to be included in all results, and even in the table. The effect size provides an indicator of the parameter's magnitude, and thus scientific context.

      We agree that the effect size provides essential scientific context. In fact, we already include the effect size explicitly in Table 1, as shown in the “Effect Size” column for each tested parameter. These values describe the magnitude of each parameter’s effect on exit probability, jamming probability, and collision rate. In the main text, effect sizes are presented as concrete changes in performance metrics (e.g., “exit probability increased from 20% to 87%,” or “with a decrease of 3.5%±8% to 5.5%±5% (mean ± s.e.)”), which we believe improves interpretability and scientific relevance.  

      To further clarify this in the main text, we have reviewed the reported results and ensured that effect sizes are mentioned more consistently wherever GLM outcomes are discussed. Additionally, we have added a brief note in the table caption to emphasize that effect sizes are provided for all tested parameters.

      The 'tStat' appears multiple times and seems to be the output of the MATLAB GLM function. This acronym is specific to the MATLAB implementation and needs to be replaced with a conventionally used acronym such as 't', or the full form 't-statistic' too. This step is to keep the results independent of the programming language used.

      We have replaced all instances of tStat with the more conventional term ‘t’ throughout the manuscript to maintain consistency with standard reporting practices.

      Reviewer #2 (Recommendations for the authors):

      In addition to my public review, I had a few minor points that the authors may want to consider when revising their paper.

      (1) Figures 2, 3, and 4 may benefit from using different marker styles, in addition to different colors, to show the different cases.

      Thank you for the suggestion. In Figures 2–4, the markers represent means with standard error bars. To maintain clarity and consistency across all conditions, we have chosen to keep a standardized marker style – and we clarify this in the legend. We found that varying only the colors is sufficient for distinguishing between conditions without introducing visual clutter.

      (2) The text "PK" in the inset for Figure 2A is very difficult to read. I would suggest using grey as with "RM" in the other inset.

      We have updated the insert in Figure 2A to improve legibility.

      (3) Are the error bars in Figure 3 very small? I wasn't able to see them. If that is the case, the authors may want to mention this in the caption.

      You are correct—the error bars are present in all plots but appear very small due to the large number of simulation repetitions and low variability. We have revised the caption to explicitly mention this.

      (4) The species name of PK is spelled inconsistently (kuhli, khulli, and kuhlii).

      We have corrected the species name throughout the manuscript.

      (5) Table 1 is a great condensation of all the results, but the time to exit is missing. It may be helpful if summary statistics on that were here as well.

      We have added time-to-exit to the effect size column in Table 1, alongside the other performance metrics, to provide a more complete summary of the simulation results.

      (6) I may have missed it, but why are there two values for the exit probability when nominal flight speed is varied?

      The exit probability was not monotonic with flight speed, but rather showed a parabolic trend with a clear optimum. Therefore, we reported two values representing the effect before and after the peak. We have clarified this in the revised table and updated the caption accordingly.

      (7) Table 2 has an extra header after the page break on page 18.

      The extra header in Table 2 after the page break has been removed in the revised manuscript.

      (8) The G functions have 2 arguments in their definitions and Equation 1, but only one argument in Equations 2 and 3. I wasn't able to see why.

      Thank you for pointing this out. You are correct—this was a typographical error. We have corrected the argument notation in Equations 2 and 3 and explicitly included the frequency dependence of the gain (G) functions in both equations.

      (9) D_txrx was not defined but it was used in Equation 2.

      The variable D_txrx is defined in the equation notation section as: D<sub>₍ₜₓ</sub>r<sub>ₓ</sub> – the distance [m] between the transmitting conspecific and the receiving focal bat, from the transmitter’s perspective. We have now ensured that this definition is clearly linked to Equation 2 in the revised text. Moreover, we have added a supplementary figure that illustrates the geometric configuration defined by the equations to further support clarity, as described in the Public Review above.

      (10) It was hard for me to understand what was meant by phi_rx and phi_tx. These were described as angles between the rx or tx bats and the target, but I couldn't tell what the point defining the angle was. Perhaps a diagram would help, or more precise definitions.

      We have revised the caption to provide clearer and more precise definitions Additionally, we have included a geometric diagram as a supplementary figure, as noted in the Public Review above, to visually clarify the spatial relationships and angle definitions used in the equations, see lines 498-499.

      (11) Was the hearing threshold the same for both species?

      Yes. We have clarified it in the revised version.

      (12) Collision avoidance is described as turning to the "opposite direction" in the supplemental figure explaining the model. Is this 90 degrees or 180 degrees? If 90 degrees, how do these turns decide between right and left?

      In our model, the bat does not perform a fixed 90° or 180° turn. Instead, the avoidance behavior is implemented by setting the maximum angular velocity in the direction opposite to the detected echo. For example, if the obstacle or conspecific is detected on the bat’s right side, the bat begins turning left, and vice versa.

      This turning direction is re-evaluated at each decision step, which occurs after every echolocation pulse. The bat continues turning in the same direction if the obstacle remains in front, otherwise it resumes regular pathfinding. We have clarified this behavior in the updated figure caption and model description, see lines 478-493.

      Reviewer #3 (Recommendations for the authors):

      (1) Lines 27-31: These sentences mischaracterize the results. This claim appears to equate "the model works" with "this is what bats actually do." Also, the model does not indicate that bats' echolocation strategies are robust enough to mitigate the effects of jamming - this is self-evident from the fact that bats navigate successfully via echolocation in dense groups.

      Thank you for the comment. Our aim was not to claim that the model confirms actual bat behavior, but rather to demonstrate that simple and biologically plausible strategies—such as signal redundancy and basic pathfinding—are sufficient to explain how bats might cope with acoustic interference in dense settings. We have revised the wording to better reflect this goal and to avoid overinterpreting the model's implications.

      See abstract in the revised version.  

      (2) Line 37: This number underestimates the number of bats that form some of the largest aggregations of individuals worldwide - the free-tailed bats can form aggregations exceeding several million bats.

      We have revised the text to reflect that some bat species, such as free-tailed bats, are known to form colonies of several million individuals, which exceed the typical range. The updated sentence accounts for these extreme cases, see lines 36-37.

      (3) The flight densities explained in the introduction and chosen references are not representative of the literature - without providing additional justification for the chosen species, it can be interpreted that the selection of the species for the simulation is somewhat arbitrary. If the goal is to model dense emergence flight, why not use a species that has been studied in terms of acoustic and flight behavior during dense emergence flights---such as Tadarida brasiliensis?

      Our goal was to develop a general model applicable to a broad class of FMecholocating bat species. The two species we selected—Pipistrellus kuhlii (PK) and Rhinopoma microphyllum (RM)—span a wide range of signal characteristics: from wideband (PK) to narrowband (RM), providing a representative contrast in call structure. 

      Although we did not include Tadarida brasiliensis (TB) specifically, its echolocation calls are acoustically similar to RM in terminal frequency and fall between PK and RM in bandwidth. Therefore, we believe our findings are likely to generalize to TB and other FM-bats.

      Moreover, as noted in a previous response, the average inter-bat distance in our highest-density simulations (0.27 m) is still smaller than those reported for Tadarida brasiliensis during dense emergences—further supporting the relevance of our model to such scenarios.

      To support broader applicability, we also provide a supplementary graphical user interface (GUI) that allows users to modify key echolocation parameters and explore their impact on behavior—making the framework adaptable to additional species, including TB.

      (4) Line 78: It is not clear how (or even if) the simulated bats estimate the direction of obstacles. The explanation given in lines 457-463 is quite confusing. What is the acoustic/neurological mechanism that enables this direction estimation? If there is some mechanism (such as binaural processing), how does this extrapolate to 3D?

      This comment echoes a similar concern raised by a previous reviewer. As explained earlier, in the current simulation, the Direction of Arrival (DOA) was not modeled via an explicit binaural processing mechanism. The complete  is detailed in  to Reviewer #1, Line 457. This implementation is now clarified in the revised text, and a detailed description of the localization process is also provided in the Methods section (lines 583-592).

      (5) The authors propose they are modeling the dynamic echolocation of bats in the simulation (line 79), but it appears (whether this is due to a lack of information in the manuscript or true lack in the simulation) that the authors only modeled a flight response. How did the authors account for bats dynamically changing their echolocation? This is unclear and from what I can tell may just mean that the bats can switch between foraging phase call types depending on the distance to a detected obstacle. Can the authors elaborate more on this?

      The echolocation behavior of the bats—including dynamic call adjustments— was implemented in the simulation and is described in detail in the Methods section (lines 498-520 and Table 2). To avoid redundancy, the Results chapter originally referred to this section, but we have now added a brief explanation in the Results to clarify that the bats’ call parameters (IPI, duration, and frequency range) adapt based on the distance to detected objects, following empirically documented echolocation phases ("search," "approach," "buzz"). These dynamics are consistent with established bat behavior during navigation in cluttered environments such as caves.

      (6) Figure 1 C3: "Detection threshold": what is this and how was it derived?

      The caption also mentions yellow arrows, but they are absent from the figure. C4: Each threshold excursion is marked with an asterisk, but there are many more excursions than asterisks. Why are only some marked? Unclear.

      C3: The detection threshold is determined dynamically. It is set to the greater of either 7 dB above the noise level (0 dB-SPL)(Kick, 1982; Saillant et al., 1993; Sanderson et al., 2003; Boonman et al., 2013) or the maximal received level minus 70 dB, effectively applying a dynamic range of 70 dB. This clarification has been added to the Methods section. The yellow arrow has been added.

      C4: Thank you for this important observation. Only peaks marked with asterisks represent successful detections—those that were identified in both the interference-free and full detection conditions, as explained in the Methods. Other visible peaks result from masking signals or overlapping echoes from nearby reflectors, but they do not meet the detection criteria. To keep the figure caption concise, we have elaborated on this process more clearly in the revised Methods section. We added this information to the legend

      (7) Figure 2: A line indicating RM, No Masking is absent

      Thank you for pointing this out. The missing line for RM, No Masking has now been added in the revised version of Figure 2.

      (8) Line 121: "reflected off conspecifics". Does this mean echoes due to conspecifics?

      The phrase "reflected off conspecifics" refers to echoes originating from the bat’s own call and reflected off the bodies of nearby conspecifics. We have clarified the wording in the revised text to avoid confusion

      (9) Line 125: Why are low-frequency channels stimulated by higher frequencies? This needs further clarification.

      The cochlear filter bank in our model is implemented using gammatone filters, each modeled as an 8th-order Butterworth filter. Due to the non-ideal filter response and relatively broad bandwidths—especially in the lower-frequency channels—strong energy from the beginning of the downward FM chirp (at higher frequencies) can still produce residual activation in lower-frequency channels. While these stimulations are usually below the detection threshold, they may still be visible as early sub-threshold responses. Given the technical nature of this explanation (a property of the filter implementation) and it does not influence the detection outcomes, we have chosen not to elaborate on it in the figure caption or Methods.

      (10) Lines 146-150: This is an interesting finding. Is there a theoretical justification for it?

      This outcome arises directly from the simulation results. As noted in the Discussion (lines 359-365), although Pipistrellus kuhlii (PK) shows a modest advantage in jamming resistance due to its broader bandwidth, the redundancy in sensory information across calls—enabled by frequent echolocation—appears to compensate for these signal differences. As a result, the small variations in echo quality between species do not translate into significant differences in performance. We speculate that if the difference in jamming probability had been larger, performance disparities would likely have emerged.

      (11) Line 151: The authors define a jammed echo as an echo entirely missed due to masking. Is this appropriate? Doesn't echo mis-assignment also constitute jamming?

      We agree that echo mis-assignment can also degrade performance; however, in our model, we distinguish between two outcomes: (1) complete masking (echo not detected), and (2) detection with a localization error. As explained in the Methods (lines 500–507), we run the detection analysis twice—once with only desired echoes (“interference-free detection”) and once including masking signals (“full detection”). If a previously detected echo is no longer detected, it is classified as a jammed echo. If the echo is still detected but the delay shifts by more than 100 µs compared to the interference-free condition, it is also considered jammed. If the delay shift is smaller, it is treated as a detection with localization error rather than full jamming. We have clarified this distinction in the revised Methods section.

      (12) Figure 2-E: Detection probability statistics are of limited usefulness without accompanying false alarm rate (FAR) statistics. Do the authors have FAR numbers?

      We understand FAR to refer to instances where masking signals or other acoustic phenomena are mistakenly interpreted as real echoes from physical objects. As explained in the manuscript, we implemented two model versions: one without confusion, and one with full confusion.

      Figure 2E reports detection performance under the non-confusion model, in which only echoes from actual physical reflectors are used, and no false detections occur—hence, the false alarm rate is effectively zero in this condition. In the full-confusion model, all detected echoes—including those originating from masking signals or conspecific calls—are treated as valid detections, which may include false alarms. However, we did not explicitly quantify the false alarm rate as a separate metric in this simulation.

      We agree that tracking FAR could be informative and will consider incorporating it into future versions of the model.

      (13) Line 161: RM bats suffered from a significantly higher probability of the "desired conspecific's echoes" being jammed. What does "desired conspecific's echoes" mean? This is unclear.

      The term “desired conspecific's echoes” refers to echoes originating from the bat’s own call, reflected off nearby conspecifics, which are treated as relevant reflectors for collision avoidance. We have revised the wording in the text for clarity.

      (14) Line 188: Why didn't the size of the integration window affect jamming probability? I couldn't find this explained in the discussion.

      The jamming probability in our analysis is computed at the individual-echo level, prior to any temporal integration. Since the integration window is applied after the detection step, it does not influence whether a specific echo is masked (i.e., jammed) or not. Therefore, as expected, we did not observe a significant effect of integration window size on jamming probability.

      (15) Line 217-218: Why do the authors think this would be?

      Thank you for the thoughtful question. We agree that, in theory, increasing call intensity should raise the levels of both desired echoes and masking signals proportionally. However, in our model, the environmental noise floor and detection threshold remain constant, meaning that higher call intensities increase the signal-to-noise ratio (SNR) more effectively for weaker echoes, especially those at longer distances or with low reflectivity. This could lead to a higher likelihood of those echoes crossing the detection threshold, resulting in a small but measurable reduction in jamming probability.

      Additionally, the non-linear behavior of the filter-bank receiver—including such as thresholding at multiple stages—can introduce asymmetries in how increased signal levels affect the detection of target versus masking signals.

      That said, the effect size was small, and the improvement in jamming probability did not translate into any significant gain in behavioral performance (e.g., exit probability or collision rate), as shown in Figure 3C.

      (16) Line 233: I'm not sure I understand how a slightly improved aggregation model that clustered detected reflectors over one-second periods is different. Doesn't this just lead to on average more calls integrated into memory?

      While increasing the memory duration does lead to more detections being available, the enhanced aggregation model (we now refer to as multi-call clustering) differs fundamentally from the simpler one. As detailed in the Methods, it includes additional processing steps: clustering spatially close detections, removing outliers, and estimating wall directions based on the spatial structure of clustered echoes. In contrast, the simpler model treats each detection as an isolated point without estimating obstacle orientation. These additional steps allow for more robust environmental interpretation and significantly improve performance under high-confusion conditions. We have clarified it in revised text (lines 606-616) and added a Supplementary Figure 2B.

      (17) Table 1: What about conspecific target strength?

      We have now added the conspecific target strength as a tested parameter in Table 1, along with its tested range, default value, and measured effect sizes. A detailed sensitivity analysis is also presented in Supplementary Figure 4, demonstrating that variations in conspecific target strength had relatively minor effects on performance metrics.  

      (18) Figure 3-A: The x-axis is the number of calls in the integration window. But the leftmost sample on each curve is at 0 calls. Shouldn't this be 1?

      “0 calls” refers to the case where only the most recent call is used for pathfinding—without integrating any information from prior calls. The x-axis reflects the number of previous calls stored in memory, so a value of 0 still includes the current call. We’ve clarified this terminology in the figure caption.

      (19) Lines 282-283: This statement needs to be clarified that it is with the constraints of using a 2D simulation with at most 33 bats/m^2. It also should be clarified that it is assumed the bat can reliably distinguish between its own echoes and conspecific echoes, which is a very important caveat.

      We have revised the text to clarify that the results are based on a 2D simulation with a maximum tested density of 33 bats/m². We also now explicitly state that the model assumes bats can distinguish between their own echoes and those generated by conspecifics—an assumption we recognize as a simplification. These clarifications help place the results within the scope and constraints of the simulation. Moreover, as described in the text (and noted in previous response): the average distance to the nearest bat in our simulation is 0.27m (with 100 bats), whereas reported distances in very dense colonies are 0.5m

      (20) Line 294: What is this sentence referring to?

      The sentence refers to the finding that, even under high bat densities, a substantial portion of the echoes—particularly those reflected from nearby obstacles (e.g., 1 m away)—were jammed due to masking. Nevertheless, the bats in the simulation were still able to navigate successfully using partial sensory input. We have clarified the sentence in the revised text to make this point more explicit, see line 333-336.

      (21) Line 302: Was jamming less likely when IPI was higher or lower? I could not find this demonstrated anywhere in the manuscript.

      We agree that the original text was not sufficiently clear on this point. While we did not explicitly test fixed IPI values as a parameter, the model does simulate the natural behavior of decreasing IPI as bats approach obstacles. This behavior is supported by empirical observations and is incorporated into the echolocation dynamics of the simulation. We have clarified this point in the revised text (see Lines 346-351) and explained that while lower IPI introduces more acoustic overlap, it also increases redundancy and improves detection through temporal integration.

      (22) Lines 313-314: This is an interesting assumption, but it is not evident that is substantiated by the references.

      The claim is based on well-established principles in signal processing and bioacoustics. Wideband signals—such as those emitted by PK bats— distribute their energy over a broader frequency range, which makes them inherently more resistant to narrowband interference and masking. This concept is commonly applied in both biological and artificial sonar systems and is supported by empirical studies in bats and theory in acoustic sensing.

      For example, Beleyur & Goerlitz (2019) demonstrate that broader bandwidth calls improve detection in cluttered and jamming-prone environments. Similarly, Ulanovsky et al. (2004) and Schnitzler & Kalko (200) discuss how FM bats' wideband calls enhance temporal and spatial resolution, helping to reduce the impact of overlapping signals from conspecifics. These findings align with communication theory where spread-spectrum techniques improve robustness in noisy environments.

      We agree with the reviewer that this is an important point and we have updated the manuscript to clarify this rationale and cite the relevant literature accordingly – lines 631-363,

      (23) Lines 318-319: What is the justification for "probably"? Isn't this just a supposition?

      We agree with the reviewer’s point and have rephrased the sentence

      (24) Line 320: How does this 63% performance match the sentence in line 295?

      The sentence in Line 295 refers to the overall ability of the bats to navigate successfully despite high jamming levels, highlighting the robustness of the strategy under challenging conditions. The figure in Line 320 (63%) quantifies this performance under the most extreme simulated scenario (100 bats / 3 m²), where both spatial and acoustic interferences are maximal. We have rephrased the text in the revised version (lines 324-327).

      (25) Lines 341-345: It seems like this is more likely to be the main takeaway of the paper.

      As noted in the Public Review above, there is substantial literature supporting the assumption that bats can recognize their own echoes and distinguish them from those of conspecifics (e.g., Schnitzler, Bioscience, 2001; Kazial et al., 2001, 2008; Burnett & Masters, 2002; Chiu et al., 2009; Yovel et al., 2009; Beetz & Hechavarría, 2022). Therefore, we consider our assumption of selfrecognition to be well-supported, at least under typical conditions. That said, we agree that the impact of echo confusion on performance is significant and highlights a critical challenge in dense environments.

      To our knowledge, this is the first computational model to explicitly simulate both self-recognition and full echo confusion under high-density conditions. We believe that the combination of modeled constraints and the demonstrated robustness of simple sensorimotor strategies, even under worst-case assumptions, is what makes this contribution both novel and meaningful.

      (26) Lines 349-350: What is the aggregation model? What is meant by "integration"?

      We have revised the text to clarify that the “aggregation model” refers to a multi-call clustering process that includes clustering of detections, removal of outliers, and estimation of wall orientation, as described in detail in the revised Methods and Results sections.

      (27) Line 354: Again, why isn't this the assumption we're working under?

      As addressed in our response to Comment 25, our primary model assumes that bats can recognize their own echoes—an assumption supported by substantial empirical evidence. The alternative "full confusion" model was included to explore a worst-case scenario and highlight the behavioral consequences of failing to distinguish self from conspecific echoes. We assume that real bats may experience some degree of echo misidentification; however, our assumption of full confusion represents a worst-case scenario.

      (28) Line 382: "Under the assumption that..." I agree that bats probably can, but if we assume they can differentiate them all, where's the jamming problem?

      The assumption that bats can theoretically distinguish between different signal sources applies after successful detection. However, the jamming problem arises during the detection and localization stages, where acoustic interference can prevent echoes from crossing the detection threshold or distort their timing.

      (29) Lines 386-387: The paper referenced focused on JAR in the context of foraging. What changes were made to the simulation to switch to obstacle avoidance?

      While the simulation framework in Mazar & Yovel (2020) was developed to study jamming avoidance during foraging, the core components—such as the acoustic calculations, receiver model, and echolocation behavior—remain applicable. For the current study, we adapted the simulation extensively to address colony-exit behavior. These modifications include modeling cave walls as acoustic reflectors, implementing a pathfinding algorithm, integrating obstacle-avoidance maneuvers, and adapting the integration window and integration processes. These updates are detailed throughout the Methods section.

      (30) Line 400-402: Something doesn't add up with the statement: each decision relies on an integration window that records estimated locations of detected reflectors from the last five echolocation calls, with the parameter being tested between 1 and 10 calls. Can the authors reword this to make it less confusing?

      We have reworded the sentence to clarify that the default integration window includes five calls, while we systematically tested the effect of using 1 to 10 calls, see lines 486-487.

      (31) Line 393: "30 deg/sec" why was this value chosen?

      The turning rate of 30 deg/sec was manually selected to approximate the curvature of natural foraging flight paths observed in Rhinopoma microphyllum using on-board tags. Moreover, in Mazar & Yovel (2020), we showed that the flight dynamics of simulated bats in a closed room closely matched those of Pipistrellus kuhlii flying in a room of similar dimensions. However, in the current simulation, bats rarely follow a random-walk trajectory due to the structured environment and frequent obstacle detection. As a result, this parameter has no meaningful impact on the simulation outcomes.

      (32) Line 412: "Harmony" --- do you mean harmonic? And what is the empirical evidence that RM bats use the 2nd harmonic compared to the 1st?

      Perhaps showing a spectrogram of a real RM signal would be helpful.

      The typo-error was corrected. For reference See (Goldshtein et al., 2025)

      (33) Table 2: Something is incorrect with the table. The first row on the next page is the wrong species name. Also, where are the citations for these parameter values?

      The table header has been corrected in the revised version. The parameter values for flight and echolocation behavior were derived from existing literature and empirical data: Pipistrellus kuhlii parameters were based on Kalko (1995), and Rhinopoma microphyllum parameters were extracted from our own recordings using on-board tags, as described in Goldstein et al. (2025). We have added the appropriate citations to Table 2.

      (34) Line 442: How was the threshold level chosen?

      The detection threshold in each level is set to the greater of either 7 dB above the noise level (0 dB-SPL) or the maximal received level minus 70 dB, effectively applying a dynamic range of 70 dB.

      (35) Line 445: 100 micros: This is about 3cm. The resolution of PK is about 1cm. For RM it's about 10cm. So, this window is generous for PK, but too strict for RM.

      To keep the model simple and avoid introducing species-specific detection thresholds, we selected a biologically plausible compromise that could reasonably apply to both species. This simplification ensures consistency across simulations while remaining within the known behavioral range.

      (36) Line 448: What is the spectrum of the Gaussian noise, and did it change between PK and RM?

      We used the same white Gaussian noise with a flat spectrum across the relevant frequency range (10–80 kHz) for both species. We have clarified this in the revised text in lines 570-572.

      (37) Line 451: 4 milliseconds is 1.3m. Is this appropriate?

      The 4 milliseconds window was selected based on established auditory masking thresholds described in Mazar & Yovel (2020), and supported by (Popper and Fay, 1995) ch. 2.4.5, ((Blauert, 1997),  ch. 3.1 and (Mohl and Surlykke, 1989). These values provide conservative lower bounds on bats’ ability to cope with masking (Beleyur and Goerlitz, 2019). For simplicity, we used constant thresholds within each window, see lines 574-576.  

      (38) Line 452: Citation for the forward and backward masking durations?

      See the  to the previous comment.

      (39) Lines 460-461: This is unclear. How does the bat get directional information? The authors claim to be able to measure direction-of-arrival for each detection, but it is not clear how this is done

      As noted in our response to Reviewer 1 (Comment on Line 457), directional information is not computed via an explicit binaural model. Instead, we assume the bat estimates the direction of arrival with an angular error that depends on the SNR, based on established studies (e.g., Simmons et al., 1983; Popper & Fay, 1995). We have clarified this in the revised text in lines 583-592.

      (40) Line 467: It seems like the authors are modeling pulse-echo ambiguity, at least in this one alternative model, which is good! However the alternative model doesn't get much attention in the paper. Is there a reason for this?

      We would like to clarify that we did not model pulse-echo. In our confusion model, all echoes received within the IPI are attributed to the bat’s most recent call. This includes echoes that may in fact originate from conspecific calls, but the model does not assign self-echoes to earlier pulses or span multiple IPIs. Therefore, while the model captures echo confusion, it does not include true pulse-echo ambiguity. We have clarified this point in the revised text in lines 551-553.

      (41) Line 41: "continuous" is more appropriate than "constant".

      Thank you, we have rephrased the text accordingly.

      (42) Line 69: "band width" should be one word.

      Thank you, we have corrected it to “bandwidth”.

      (43) Line 79: "bats" should be in the possessive.

      Thank you, the text has been rephrased.

      (44) Line 128: "convoluted" don't you mean "convolved"?

      We have replaced “convoluted” with the correct term “convolved” in the revised text.

      (45) Please check your references, as there are some incomplete citations and typos.

      Thank you, we have reviewed and corrected all references for completeness and consistency.

      References

      Beetz, M.J. and Hechavarría, J.C. (2022) ‘Neural Processing of Naturalistic Echolocation Signals in Bats’, Frontiers in Neural Circuits, 16, p. 899370. Available at: https://doi.org/10.3389/FNCIR.2022.899370/BIBTEX.

      Beleyur, T. and Goerlitz, H.R. (2019) ‘Modeling active sensing reveals echo detection even in large groups of bats’, Proceedings of the National Academy of Sciences of the United States of America, 116(52), pp. 26662–26668. Available at: https://doi.org/10.1073/pnas.1821722116.

      Betke, M. et al. (2008) ‘Thermal Imaging Reveals Significantly Smaller Brazilian Free-Tailed Bat Colonies Than Previously Estimated’, Journal of Mammalogy, 89(1), pp. 18–24. Available at: https://doi.org/10.1644/07-MAMM-A-011.1.

      Blauert, J. (1997) ‘Spatial Hearing: The Psychophysics of Human Sound Localization (rev. ed.)’.

      Boerma, D.B. et al. (2019) ‘Wings as inertial appendages: How bats recover from aerial stumbles’, Journal of Experimental Biology, 222(20). Available at: https://doi.org/10.1242/JEB.204255/VIDEO-3.

      Boonman, A. et al. (2013) ‘It’s not black or white-on the range of vision and echolocation in echolocating bats’, Frontiers in Physiology, 4 SEP(September), pp. 1–12. Available at: https://doi.org/10.3389/fphys.2013.00248.

      Boonman, A.M., Parsons, S. and Jones, G. (2003) ‘The influence of flight speed on the ranging performance of bats using frequency modulated echolocation pulses’, The Journal of the Acoustical Society of America, 113(1), p. 617. Available at: https://doi.org/10.1121/1.1528175.

      Burnett, S.C. and Masters, W.M. (2002) ‘Identifying Bats Using Computerized Analysis and Artificial Neural Networks’, North American Symposium on Bat Research, 9.

      Chili, C., Xian, W. and Moss, C.F. (2009) ‘Adaptive echolocation behavior in bats for the analysis of auditory scenes’, Journal of Experimental Biology, 212(9), pp. 1392–1404. Available at: https://doi.org/10.1242/jeb.027045.

      Fujioka, E. et al. (2021) ‘Three-Dimensional Trajectory Construction and Observation of Group Behavior of Wild Bats During Cave Emergence’, Journal of Robotics and Mechatronics, 33(3), pp. 556–563. Available at: https://doi.org/10.20965/jrm.2021.p0556.

      Gillam, E.H. et al. (2010) ‘Echolocation behavior of Brazilian free-tailed bats during dense emergence flights’, Journal of Mammalogy, 91(4), pp. 967–975. Available at: https://doi.org/10.1644/09-MAMM-A-302.1.

      Goldshtein, A. et al. (2025) ‘Onboard recordings reveal how bats maneuver under severe acoustic interference’, Proceedings of the National Academy of Sciences, 122(14), p. e2407810122. Available at: https://doi.org/10.1073/PNAS.2407810122.

      Griffin, D.R., Webster, F.A. and Michael, C.R. (1958) ‘THE ECHOLOCATION OF FLYING INSECTS BY BATS ANIMAL BEHAVIOUR , Viii , 3-4’.

      Hagino, T. et al. (2007) ‘Adaptive SONAR sounds by echolocating bats’, International Symposium on Underwater Technology, UT 2007 - International Workshop on Scientific Use of Submarine Cables and Related Technologies 2007, pp. 647–651. Available at: https://doi.org/10.1109/UT.2007.370829.

      Hiryu, S. et al. (2008) ‘Adaptive echolocation sounds of insectivorous bats, Pipistrellus abramus, during foraging flights in the field’, The Journal of the Acoustical Society of America, 124(2), pp. EL51–EL56. Available at: https://doi.org/10.1121/1.2947629.

      Jakobsen, L. et al. (2024) ‘Velocity as an overlooked driver in the echolocation behavior of aerial hawking vespertilionid bats’. Available at: https://doi.org/10.1016/j.cub.2024.12.042. Jakobsen, L., Brinkløv, S. and Surlykke, A. (2013) ‘Intensity and directionality of bat echolocation signals’, Frontiers in Physiology, 4 APR(April), pp. 1–9. Available at: https://doi.org/10.3389/fphys.2013.00089.

      Jakobsen, L. and Surlykke, A. (2010) ‘Vespertilionid bats control the width of their biosonar sound beam dynamically during prey pursuit’, 107(31). Available at:

      https://doi.org/10.1073/pnas.1006630107.

      Kalko, E.K. V. (1995) ‘Insect pursuit, prey capture and echolocation in pipistrelle bats (Microchirptera)’, Animal Behaviour, 50(4), pp. 861–880.

      Kazial, K.A., Burnett, S.C. and Masters, W.M. (2001) ‘ Individual and Group Variation in Echolocation Calls of Big Brown Bats, Eptesicus Fuscus (Chiroptera: Vespertilionidae) ’, Journal of Mammalogy, 82(2), pp. 339–351. Available at: https://doi.org/10.1644/15451542(2001)082<0339:iagvie>2.0.co;2.

      Kazial, K.A., Kenny, T.L. and Burnett, S.C. (2008) ‘Little brown bats (Myotis lucifugus) recognize individual identity of conspecifics using sonar calls’, Ethology, 114(5), pp. 469– 478. Available at: https://doi.org/10.1111/j.1439-0310.2008.01483.x.

      Kick, S.A. (1982) ‘Target-detection by the echolocating bat, Eptesicus fuscus’, Journal of Comparative Physiology □ A, 145(4), pp. 431–435. Available at: https://doi.org/10.1007/BF00612808/METRICS.

      Kothari, N.B. et al. (2014) ‘Timing matters: Sonar call groups facilitate target localization in bats’, Frontiers in Physiology, 5 MAY. Available at: https://doi.org/10.3389/fphys.2014.00168.

      Mohl, B. and Surlykke, A. (1989) ‘Detection of sonar signals in the presence of pulses of masking noise by the echolocating bat , Eptesicus fuscus’, pp. 119–124.

      Moss, C.F. and Surlykke, A. (2010) ‘Probing the natural scene by echolocation in bats’, Frontiers in Behavioral Neuroscience. Available at: https://doi.org/10.3389/fnbeh.2010.00033.

      Neretti, N. et al. (2003) ‘Time-frequency model for echo-delay resolution in wideband biosonar’, The Journal of the Acoustical Society of America, 113(4), pp. 2137–2145. Available at: https://doi.org/10.1121/1.1554693.

      Popper, A.N. and Fay, R.R. (1995) Hearing by Bats. Springer-Verlag.

      Roy, S. et al. (2019) ‘Extracting interactions between flying bat pairs using model-free methods’, Entropy, 21(1). Available at: https://doi.org/10.3390/e21010042.

      Sabol, B.M. and Hudson, M.K. (1995) ‘Technique using thermal infrared-imaging for estimating populations of gray bats’, Journal of Mammalogy, 76(4). Available at: https://doi.org/10.2307/1382618.

      Saillant, P.A. et al. (1993) ‘A computational model of echo processing and acoustic imaging in frequency- modulated echolocating bats: The spectrogram correlation and transformation receiver’, The Journal of the Acoustical Society of America, 94(5). Available at: https://doi.org/10.1121/1.407353.

      Salles, A., Diebold, C.A. and Moss, C.F. (2020) ‘Echolocating bats accumulate information from acoustic snapshots to predict auditory object motion’, Proceedings of the National Academy of Sciences of the United States of America, 117(46), pp. 29229–29238. Available at: https://doi.org/10.1073/PNAS.2011719117/SUPPL_FILE/PNAS.2011719117.SAPP.PDF.

      Sanderson, M.I. et al. (2003) ‘Evaluation of an auditory model for echo delay accuracy in wideband biosonar’, The Journal of the Acoustical Society of America, 114(3), pp. 1648– 1659. Available at: https://doi.org/10.1121/1.1598195.

      Schnitzler, H., Bioscience, E.K.- and 2001, undefined (no date) ‘Echolocation by insecteating bats: we define four distinct functional groups of bats and find differences in signal structure that correlate with the typical echolocation ’, academic.oup.comHU Schnitzler, EKV KalkoBioscience, 2001•academic.oup.com [Preprint]. Available at: https://academic.oup.com/bioscience/article-abstract/51/7/557/268230 (Accessed: 17 March 2025).

      Schnitzler, H.-U. et al. (1987) ‘The echolocation and hunting behavior of the bat,Pipistrellus kuhli’, Journal of Comparative Physiology A, 161(2), pp. 267–274. Available at: https://doi.org/10.1007/BF00615246.

      Simmons, J.A. et al. (1983) ‘Acuity of horizontal angle discrimination by the echolocating bat , Eptesicus fuscus’. Simmons, J.A. and Kick, S.A. (1983) ‘Interception of Flying Insects by Bats’, Neuroethology and Behavioral Physiology, pp. 267–279. Available at: https://doi.org/10.1007/978-3-64269271-0_20.

      Surlykke, A., Ghose, K. and Moss, C.F. (2009) ‘Acoustic scanning of natural scenes by echolocation in the big brown bat, Eptesicus fuscus’, Journal of Experimental Biology, 212(7), pp. 1011–1020. Available at: https://doi.org/10.1242/JEB.024620.

      Theriault, D.H. et al. (no date) ‘Reconstruction and analysis of 3D trajectories of Brazilian free-tailed bats in flight’, cs-web.bu.edu [Preprint]. Available at: https://csweb.bu.edu/faculty/betke/papers/2010-027-3d-bat-trajectories.pdf (Accessed: 4 May 2023).

      Ulanovsky, N. and Moss, C.F. (2008) ‘What the bat’s voice tells the bat’s brain’, Proceedings of the National Academy of Sciences of the United States of America, 105(25), pp. 8491– 8498. Available at: https://doi.org/10.1073/pnas.0703550105. Vanderelst, D. and Peremans, H. (2018) ‘Modeling bat prey capture in echolocating bats : The feasibility of reactive pursuit’, Journal of theoretical biology, 456, pp. 305–314.

      Yovel, Y. et al. (2009) ‘The voice of bats: How greater mouse-eared bats recognize individuals based on their echolocation calls’, PLoS Computational Biology, 5(6). Available at: https://doi.org/10.1371/journal.pcbi.1000400.

      Yovel, Y. and Ulanovsky, N. (2017) ‘Bat Navigation’, The Curated Reference Collection in Neuroscience and Biobehavioral Psychology, pp. 333–345. Available at: https://doi.org/10.1016/B978-0-12-809324-5.21031-6.

    1. As an example, I recently wanted to subscribe to the RawTools newsletter. When I went to their newsletter subscription page, I noticed that their URL looked like this: https://rawtools.us11.list-manage.com/subscribe?u=00722345fc94fb4d4b323edc3&id=4ff553ba3e copyIf you can find a URL from a Mailchimp email campaign in a format like this, you can usually use it to get its respective RSS feed. There are 3 pieces we need in order to find this list’s RSS feed, and all of them we can find in this URL: us11 - This appears to be the Mailchimp server location associated with the mailing list’s account u=00722345fc94fb4d4b323edc3 - I think this is a user identification code? Not sure. We need it, though! id=4ff553ba3e - Again, not 100% sure what this is; possibly a list id? We need it too, regardless ¯\_(ツ)_/¯ Once we’ve got those pieces, we can use them to construct our RSS feed. A Mailchimp list’s RSS feed looks like this: https://[SERVER LOCATION CODE].campaign-archive.com/feed?u=["u" CODE]&id=["id" CODE] copyThe campaign-archive and /feed parts are the important parts that need to be switched out here. So, we put all those pieces together, and end up with the following feed URL: https://us11.campaign-archive.com/feed?u=00722345fc94fb4d4b323edc3&id=4ff553ba3e copyThen, adding that into an RSS reader app gives us the last few campaign emails that were sent out from that list, as well as allows us to be notified of future emails without it cluttering up our email inbox

      Mailchimp has secret RSS

    1. Assistant linked to federal registration and award history for Arayna Technology Solutions (UEI LQKTFZMD19B3), update here. Upload a capability statement or catalogue for your company here or upload an opportunity-specific document here Some suggestions:Please summarize the work to be completed under this opportunityDo the documents mention an incumbent contractor?Does this contract have any security clearance requirements?I'd like to anonymously submit a question to the procurement officer(s) Do they need certified people or something similar Loading Yes, the solicitation explicitly requires that all primary technical services be performed by individuals who are Mitel/Unify certified, qualified, and experienced in performing maintenance and repair of the specified equipment. Specifically, the contractor must:- Be a Mitel/Unify Systems certified partner authorized to perform on-network installations and maintenance.- Ensure that technicians have a complete understanding of system design, installation diagrams, and maintenance records for the equipment listed in the PWS.- Meet the following Air Force requirements for access: - Air Force Cyber Readiness Training (Annual) - CompTIA Security+ CertificationThese requirements are detailed in Section 4.0 of the [Performance Work Statement (PWS)](https://www.highergov.com/document/attachment-1-ellsworth-28th-bw-unify-sustainment-pws-pdf-4c2c3b/). Submit × Opportunity Assistant Document Library Capability Statements Opportunity Documents Drop or click to upload a Capability Statement (.pdf, .docx, or .xlsx) Close Select Document for Processing Select a document for analysis. Submit Match Score Factors Contributing to Match Registered with primary NAICS 541519 matches this opportunity. Registered for NAICS 541519 - Other Computer Related Services This opportunity is open to competition Potential Issues or Gaps Minimal or no matching keywords found for your capabilities May have limited or no recent experience with NAICS 541519 - Other Computer Related Services Registration not found for PSC DG10 - IT And Telecom - Network As A Service May have limited or no recent experience with PSC DG10 - IT And Telecom - Network As A Service Little or no prior performance with Air Combat Command This opportunity was marked No Bid by: Arayna Description Original Summary Original Summary The 28th Contracting Squadron, Ellsworth AFB, South Dakota, has issued this Solicitation, FA469025Q0062, to compete and award a Firm-Fixed Price Contract for sustainment and repair for the Mitel/Unify OpenScape Systems located at Ellsworth Air Force Base, South Dakota. This requriement is being solicited Full & Open (No Small Business Set-Aside), limited to Unify OpenScape Brand Name items and certified Technicians. Attached to this Solicitation Notice are: Standard Form 1449 Solicitation Attachment 1 - Performance Work Statement (PWS) Attachment 2 Wage Determination All questions and comments on this solicitation must be submitted in writing to joshua.johnson.233@us.af.mil no later than Wednesday, 10 September 2025 by 12:00pm MDT. Please title all emails with questions with the following subject line: FA469025Q0062 - Unify Mainteance. All interested and responsible entities are invited to submit a quote that will be considered by the 28th Contracting Office at Ellsworth AFB, South Dakota. The award will be based on the criteria established in the solicitation. Vendor quotes and all items required as listed within the Addendum to 52.212-1, Instruction to Offerors, are due to be submitted no later than Wednesday, 17 September 2025 by 3:00pm MDT to joshua.johnson.233@us.af.mil. Please title all submissions with the following subject line: FA469025Q0062 - Unify Maintenance Contractors submitting a quote must have and list within the quote their assigned Cage Code and be registered and ACTIVE in the System of Award Management (SAM) at www.sam.gov to be eligible for award. Auto-generated summaries available on select opportunities Background The 28th Contracting Squadron at Ellsworth Air Force Base, South Dakota, is issuing Solicitation FA469025Q0062 for a Firm-Fixed Price Contract aimed at the sustainment and repair of Mitel/Unify OpenScape Systems. This requirement is open to all vendors (Full & Open) but is limited to Unify OpenScape Brand Name items and certified technicians. The goal of this contract is to ensure the operational integrity of the telecommunications systems at the base.Work Details The contractor shall provide all personnel, equipment, tools, materials, supervision, and any other items and services necessary to ensure that the Unify system is operational. Key tasks include: - Performing maintenance and repair on Mitel/Unify telecommunications hardware and software installed throughout Ellsworth AFB. - Diagnosing and resolving system issues for applications such as OpenScape Voice, OpenScape Xpert, iNemsoft radio interface, and ASC voice. - Ensuring that all technical services are performed by individuals who are certified, qualified, and experienced in Mitel/Unify systems. - The scope of work includes sustainment and repair of specific equipment listed in Section 6.0 Equipment List of the Performance Work Statement (PWS).Period of Performance The contract will have a base period of 12 months with four additional option years, each lasting 12 months.Place of Performance Ellsworth Air Force Base, South Dakota. Show Less List Text Overview Agency Air Combat Command (ACC) [DoD - USAF] Response Deadline Sept. 17, 2025, 5:00 p.m. EDT Due in 12 Days Posted Sept. 4, 2025, 12:14 p.m. EDT Set Aside None NAICS 541519 - Other Computer Related Services PSC DG10 - IT And Telecom - Network As A Service Place of Performance Ellsworth AFB, SD 57706 United States Source Open Current SBA Size Standard $34 Million Pricing Fixed Price Est. Level of Competition Average Est. Value Range Experimental $50,000 - $150,000 (AI estimate) On 9/4/25 Air Combat Command issued Solicitation FA469025Q0062 for Unify OpenScape Maintenance due 9/17/25. The opportunity was issued full & open with NAICS 541519 and PSC DG10. Primary Contact Name Marc L Bellucci   Profile Email marc.bellucci.1@us.af.mil Phone (605) 385-1782 Secondary Contact Name Joshua Johnson   Profile Email joshua.johnson.233@us.af.mil Phone (605) 385-1734 Download All Explore Documents Posted documents for Solicitation FA469025Q0062 5102050100 ShownDocumentAgencyPosted DateSourceDownloadDocumentAgencyPosted DateSourceDownload Attachment 1 - Ellsworth 28th BW Unify Sustainment PWS.pdf Air Combat Command 09/04/25Contract Opportunity Text Snapshot This performance work statement (PWS) outlines the requirements for the sustainment and repair of Mitel/Unify systems at Ellsworth Air Force Base, specifically for the 28th Bomb Wing (BW). The contractor is tasked with providing all necessary personnel, equipment, tools, materials, and supervision to ensure the operational status of the Unify system. Key services include maintenance and repair... Attachment 3 - single source justification Redacted.pdf Air Combat Command 09/04/25Contract Opportunity Text Snapshot This single source justification is for a simplified acquisition related to the unify maintenance contract at ellsworth air force base (afb). the contracting activity is managed by the 28th contracting squadron, and the justification outlines the necessity for continuous maintenance, warranty support, software licenses, and timely updates for the existing unify openscape voice communication... Solicitation - FA469025Q0062.pdf Air Combat Command 09/04/25Contract Opportunity Text Snapshot This solicitation (FA469025Q0062) is for commercial products and services, specifically focused on Unify maintenance services. The solicitation outlines the requirements for a 12-month maintenance contract with options for four additional 12-month periods. The pricing arrangement is firm fixed price, and the total quantity required is specified as one unit for each period. The solicitation is... Attachment 2 - WD 15-5367 (Rev 29) dated 08jul25 (1).pdf Air Combat Command 09/04/25Contract Opportunity Text Snapshot This wage determination is issued by the U.S. Department of Labor, specifically under the Service Contract Act, with wage determination no. 2015-5367 and revision no. 29, dated July 8, 2025. It outlines the minimum wage rates and fringe benefits that contractors must pay to workers performing on federal service contracts in South Dakota, particularly in the counties of Custer, Meade, and... Question & Answer The AI Q&A Assistant has moved to the bottom right of the page Export Visible Records Clipboard CSV Excel All Records CSV Excel Opportunity Lifecycle Procurement notices related to Solicitation FA469025Q0062 5102050100 ShownTitleTypeAgencySet AsidePostedDeadlineDescriptionTitleTypeAgencySet AsidePostedDeadlineDescription Unify OpenScape Maintenance  25%Solicitation Air Combat Command None09/04/2509/17/25Description The 28th Contracting Squadron, Ellsworth AFB, South Dakota, has issued this Solicitation, FA469025Q0062, to compete and award a Firm-Fixed Price Contract for sustainment and repair for the Mitel/Unify OpenScape Systems located at Ellsworth Air Force Base, South Dakota. This requriement is being solicited Full & Open (No Small Business Set-Aside), limited to Unify OpenScape Brand Name items and certified Technicians. Attached to this Solicitation Notice are: Standard Form 1449 Solicitation Attachment 1 - Performance Work Statement (PWS) Attachment 2 Wage Determination All questions ...show moreThe 28th Contracting Squadron, Ellsworth AFB, South Dakota, has issued this Solicitation, FA469025Q0062, to compete and award a Firm-Fixed Price Contract for sustainment and repair for the Mitel/Unify OpenScape Systems located at Ellsworth Air Force Base, South Dakota. This requriement is being solicited Full & Open (No Small Business Set-Aside), limited to Unify OpenScape Brand Name items and certified Technicians. Attached to this Solicitation Notice are: Standard Form 1449 Solicitation Attachment 1 - Performance Work Statement (PWS) Attachment 2 Wage Determination All questions and comments on this solicitation must be submitted in writing to joshua.johnson.233@us.af.mil no later than Wednesday, 10 September 2025 by 12:00pm MDT. Please title all emails with questions with the following subject line: FA469025Q0062 - Unify Mainteance. All interested and responsible entities are invited to submit a quote that will be considered by the 28th Contracting Office at Ellsworth AFB, South Dakota. The award will be based on the criteria established in the solicitation. Vendor quotes and all items required as listed within the Addendum to 52.212-1, Instruction to Offerors, are due to be submitted no later than Wednesday, 17 September 2025 by 3:00pm MDT to joshua.johnson.233@us.af.mil. Please title all submissions with the following subject line: FA469025Q0062 - Unify Maintenance Contractors submitting a quote must have and list within the quote their assigned Cage Code and be registered and ACTIVE in the System of Award Management (SAM) at www.sam.gov to be eligible for award. Unify OpenScape Maintenance  0%Sources Sought Air Combat Command None04/30/2505/09/25Description This Sources Sought / RFI is issued solely for market research purposes in accordance with Federal Acquisition Regulation (FAR) Part 10 and is not a solicitation for proposals. This notice does not obligate the Government to award a contract or otherwise pay for the information provided in response. The Government will use information received in response to this notice to determine the appropriate acquisition strategy for the requirement. The 28th Maintenance Group (28 MXG) at Ellsworth AFB, South Dakota requires the contractor to provide all personnel, equipment, tools, materials, ...show moreThis Sources Sought / RFI is issued solely for market research purposes in accordance with Federal Acquisition Regulation (FAR) Part 10 and is not a solicitation for proposals. This notice does not obligate the Government to award a contract or otherwise pay for the information provided in response. The Government will use information received in response to this notice to determine the appropriate acquisition strategy for the requirement. The 28th Maintenance Group (28 MXG) at Ellsworth AFB, South Dakota requires the contractor to provide all personnel, equipment, tools, materials, supervision and any other items and services necessary to accomplish maintenance required. The primary technical services shall be performed by individuals who are Mitel/Unify certified, qualified, and experienced in performing maintenance and repair of equipment, crisis management, dispatch consoles, and all associated Unify OpenScape telecommunications hardware and software installed throughout Ellsworth 28 BW. PWS is attached to this RFI. NOTE: IF YOU DO NOT INTEND TO SUBMIT A PROPOSAL FOR THIS PROJECT WHEN IT IS FORMALLY ADVERTISED, PLEASE DO NOT SUBMIT A RESPONSE TO THIS SOURCES SOUGHT / RFI. Information requested: All interested parties are invited to provide information about your company/institution, or any teaming or joint venture partners. Interested vendors are requested to submit the following information, clearly indicating whether you are providing information: Company name, address, point of contact with phone number and email address, CAGE code, business size status (e.g., small business, large business), and website (if applicable). Manufacturer and model number. Detailed product specifications and brochures. Maintenance requirements and service support Availability and lead time. The 28 MXG will review all vendors who respond to this sources sought to determine if other companies can perform the required repairs; and if aftermarket parts can meet the government’s needs. Submission Instructions: All responses must be submitted electronically to marc.bellucci.1@us.af.mil and joshua.johnson.233@us.af.mil no later than 4:00 PM MDT, Thursday, 09 May 2025. Please include "Sources Sought / Mitel/Unify Annual Sustainment. Questions relevant to this notice shall be sent electronically to the above email address. NO PHONE INQUIRIES WILL BE ACCEPTED. All communication shall be in writing and submitted electronically with reference " Mitel/Unify Annual Sustainment.” Disclaimer: This Sources Sought / RFI is issued solely for information and planning purposes only and does not constitute a solicitation. The Government is not obligated to award a contract as a result of this announcement. No reimbursement will be made for any costs associated with providing information in response to this announcement or any follow-up requests. The Government shall not be liable for or suffer any consequential damages for any improperly identified information. Incumbent or Similar Awards Contracts Similar to Solicitation FA469025Q0062 510 ShownAward IDAwardeeAwarding AgencyPotential ValueSet AsideStartEndSimilarityDescriptionAward IDAwardeeAwarding AgencyPotential ValueSet AsideStartEndSimilarityDescription FA469022C0004Advancia Aeronautics  Air Combat Command $370.2K8AN07/26/2208/09/25 Description 1 FTE MEDICAL IT SUPPORT TECHNICIAN47QTCA21A001G-FA469025FG018Impres Technology Solutions  Air Combat Command $10.8KNone10/01/2409/30/25 Description IT AND TELECOM -NETWORK: DIGITAL NETWORK PRODUCTSNNG15SC41B-FA486123F0261Iron Bow Technologies  Air Combat Command $4.0MNone07/11/2309/30/25 Description 805 COMBAT TRAINING SQUADRON SPECIAL ACCESS PROGRAM DIGITAL ENVIRONMENT (SAP DE) EQUIPMENT, CONFIGURATION AND INSTALL. Potential Bidders and Partners Awardees that have won contracts similar to Solicitation FA469025Q0062 Explore in Partner Finder Advancia Aeronautics 2024 Obligations: $35.4 million Microtechnologies 2024 Obligations: $142.9 million ENSCO 2024 Obligations: $98.9 million CDO Technologies 2024 Obligations: $13.6 million Referentia Systems 2024 Obligations: $9.1 million UIC Government Services 2024 Obligations: $369.6 million RTX 2024 Obligations: $30.4 billion World Wide Technology 2024 Obligations: $733.9 million Similar Active Opportunities Open contract opportunities similar to Solicitation FA469025Q0062 Experiments, Prototypes, Research, and Evaluation Supporting Systems (EXPRESS) Agency: Air Force Research Laboratory (AFRL) [DoD - USAF - AFMC] Deadline: Feb. 14, 2027, 5:00 p.m. EST Type: Solicitation Set Aside: None NAICS: 541715 - Research and Development in the Physical, Engineering, and Life Sciences (except Nanotechnology and Biotechnology) Hush House Inspection Agency: Pacific Air Forces (PACAF) [DoD - USAF] Deadline: Sept. 23, 2025, 10:00 p.m. EDT Type: Synopsis Solicitation Set Aside: None NAICS: 541350 - Building Inspection Services Combat Identification Automated Target Recognition Technology (CATCH) Call 03 Agency: Department of the Air Force (USAF) [DoD] Deadline: Sept. 18, 2025, 5:00 p.m. EDT Type: Solicitation Set Aside: None NAICS: 541715 - Research and Development in the Physical, Engineering, and Life Sciences (except Nanotechnology and Biotechnology) JRE Help Desk Agency: Air Combat Command (ACC) [DoD - USAF] Deadline: Sept. 9, 2026, 4:00 p.m. EDT Type: Solicitation Set Aside: None NAICS: 541512 - Computer Systems Design Services DATA CENTER MONITORING MODERNIZATION Agency: U.S. Air Forces Europe and Africa (USAFE) [DoD - USAF] Deadline: Sept. 15, 2025, 11:00 a.m. EDT Type: Synopsis Solicitation Set Aside: None NAICS: 541513 - Computer Facilities Management Services THUNDER COMMERCIALLY AUGMENTED MISSION PLATFORM (CAMP): DEVSECOPS SOFTWARE LICENSE Agency: Air Force Sustainment Center (AFSC) [DoD - USAF - AFMC] Deadline: Sept. 20, 2025, 4:00 p.m. EDT Type: Sources Sought Set Aside: None NAICS: 541519 - Other Computer Related Services Additional Details Source Agency Hierarchy DEPT OF DEFENSE > DEPT OF THE AIR FORCE > AFGSC > FA4690 28 CONS PKC FPDS Organization Code 5700-FA4690 Source Organization Code 500022516 Last Updated Sept. 4, 2025 Last Updated By joshua.johnson.233@us.af.mil Archive Date Oct. 2, 2025 search_params = {} table = 'contract_opportunity' key = 'FA469025Q0062-Solicitation-62400' sol = "FA469025Q0062" sol_clean = "FA469025Q0062" key3 = 'c28f08e1161a4ddfb1151beb3b162400' path_key = 'FA469025Q0062-Solicitation-62400' display = 'Solicitation - Unify OpenScape Maintenance' download_params = {"fed_opportunity": {"code": [{"key": `${key}`, "display": `${display}`}], "include": "Include"}, "id": "download"} show_bidders = true award_flag = false contract_flag = false idv_flag = false incumbent_flag = 'Exists' //true award_notification_threshold = 0 type_code = 'o' dibbs_flag = false store_recent_flag = true enable_expander = true track_key = "FA469025Q0062" //defining a separate key to track here than the default key transaction_key = 'c28f08e1161a4ddfb1151beb3b162400' api_transaction_key = 'c28f08e1161a4ddfb1151beb3b162400' opp_source = 'sam' solicitation_year = 'None' topic_code = 'None' nsn = 'None' no_link_flag = 'false' show_pricing = false show_supplier = false About Contact Terms Privacy © 2025 HigherGov

      Specifically, the contractor must:

      • Be a Mitel/Unify Systems certified partner authorized to perform on-network installations and maintenance.
    1. It is important to understand that this does not mean LLMs will be gods producing 100x code, because virtually no domain that software engineering is useful has a perfect oracle. A perfect oracle is a type of feedback where you are given a “correct/incorrect” answer every single time, and they almost only appear in games as real world typically doesn’t have perfect models of correctness. Winning or losing a game is a perfect oracle, as well as creating a program that can pass the judge in a competitive programming contest.

      important and impressive advice

    1. Discuss some of the social norms that guide conversational interaction. Identify some of the ways in which language varies based on cultural context. Explain the role that accommodation and code-switching play in communication. Discuss cultural bias in relation to specific cultural identities.

      When we talk with others, there are social rules like taking turns and making eye contact that help things flow smoothly. How people use language changes a lot depending on their culture—some are more formal, others more laid-back, and what’s polite somewhere might not be in another place. Sometimes we switch up how we talk or change our style to fit in better, which is called accommodation or code-switching. Cultural bias is when we judge people unfairly based on where they come from, and that can cause misunderstandings. Knowing about these biases helps us be more respectful and avoid stepping on toes.

  6. www.assemblee-nationale.fr www.assemblee-nationale.fr
    1. DOCUMENT DE SYNTHÈSE : Les Politiques d'Accompagnement à la Parentalité en France

      Source : Rapport d’information N° 1638, Assemblée Nationale, Délégation aux droits des femmes et à l’égalité des chances entre les hommes et les femmes, sur les politiques d’accompagnement à la parentalité, présenté par Mme Sarah Legrain et Mme Delphine Lingemann, enregistré le 24 juin 2025.

      Synthèse Exécutive

      Ce rapport de la Délégation aux droits des femmes et à l’égalité des chances entre les hommes et les femmes met en lumière les inégalités persistantes dans la répartition des charges domestiques et parentales en France, majoritairement assumées par les femmes.

      Il révèle que la parentalité, loin d'être neutre en matière de genre, est une cause majeure des inégalités économiques, professionnelles et sociales entre les hommes et les femmes. La "pénalité parentale" affecte de manière significative la carrière et les revenus des femmes, tandis que les hommes en sont largement épargnés.

      Les rapporteures identifient plusieurs axes clés pour favoriser une répartition plus égalitaire des tâches parentales et promouvoir une vision positive et égalitaire de la parentalité, formulant 44 recommandations pour y parvenir.

      Ces recommandations couvrent l'éducation et l'information, la prise en compte de la parentalité au travail, l'accompagnement des parents dès le désir d'enfant, la refonte des systèmes de congés parentaux et des modes d'accueil, le soutien aux parents d'adolescents et l'accompagnement des familles monoparentales.

      Thèmes Principaux et Idées Clés

      1. La Charge Domestique et Parentale Inégalitaire : Un Frein à l'Égalité des Femmes

      • Division Sexuée Persistante : Malgré une impression d'égalité, les femmes continuent d'assumer la majeure partie des responsabilités domestiques et parentales. En moyenne, elles réalisent 71% des tâches domestiques et 65% des tâches parentales du foyer. Cette division est profondément enracinée dans un héritage historique et des stéréotypes de genre tenaces.
      • Stéréotypes de Genre : L'idée que "les mères savent mieux répondre aux besoins et attentes des enfants que les pères" est très présente chez les adultes (60% y adhèrent) et se perpétue chez les jeunes (54% des 18-24 ans). Ces stéréotypes contribuent à une dévalorisation sociale des tâches considérées comme féminines.
      • "Double Journée" des Mères : L'arrivée des enfants aggrave cette inégalité. Pour les femmes, cela représente environ "cinq heures de travail supplémentaire", tandis que pour les hommes, cela "réduit leur temps domestique et parental de deux heures". Les femmes salariées cumulent travail professionnel, domestique et parental, totalisant "onze heures par jour contre moins de dix heures pour les hommes".
      • Impact du Système de Congés : La différence de durée entre le congé maternité (16 semaines) et le congé paternité (28 jours) renforce la dynamique d'une "mère 'parent principal' et d'un père 'auxiliaire'". Le congé parental est également majoritairement pris par les mères (94% des cas), ce qui pénalise leur carrière.
      • Difficultés des Modes d'Accueil : Le manque et la répartition inégale des places en crèche et chez les assistantes maternelles obligent souvent les mères à compenser les dysfonctionnements du système. "Près de 20% des parents n’obtiennent pas de mode d’accueil, plus de 160 000 ne reprennent pas le travail faute de solution de garde pour leur enfant", les mères étant la "variable d'ajustement".

      2. Conséquences Lourdes pour les Mères : Coût Humain, Économique et Social

      • Pénalité Parentale au Travail : La parentalité a un "impact négatif de la parentalité sur le parcours professionnel des femmes", alors qu'elle n'a "aucun effet ou presque sur l’évolution professionnelle des hommes". "90% des inégalités de revenu entre les femmes et les hommes sont directement dues à la 'pénalité parentale' que subissent les femmes". Dix ans après l'arrivée du premier enfant, le revenu moyen des femmes chute d'environ 38%.
      • Discrimination : Plus de six femmes sur dix estiment qu’être mère est un frein à la carrière. 27% des femmes déclarant être discriminées au travail estiment que cette discrimination est liée à la grossesse ou au congé maternité.
      • Vulnérabilité des Mères Solos : Les mères solos (82% des familles monoparentales) sont particulièrement touchées. Elles subissent une "triple pénalité croisée : leur genre, leur situation professionnelle […], leur situation familiale", les exposant aux emplois précaires et mal rémunérés, et augmentant leur risque de pauvreté. "Près d’une mère seule sur cinq est pauvre alors qu’elle a un emploi".
      • Risque d'Épuisement et Santé Mentale : La charge disproportionnée entraîne un "risque réel d’épuisement pour les mères". L'isolement peut favoriser la dépression post-partum, qui touche environ 20% des femmes et est la "première cause de la mortalité maternelle dans l’année qui suit la naissance de l’enfant".
      • Coût Économique Élevé : Outre la perte de revenus due aux congés maternité et à la réduction d'activité, la séparation a un "lourd coût pour les mères". Une femme séparée sur trois "bascule sous le seuil de pauvreté l’année de la séparation", son niveau de vie baissant d'environ 20% (contre 7% pour les hommes). 39% des enfants vivant en famille monoparentale sont en situation de pauvreté.

      3. Propositions pour une Parentalité Égalitaire

      Les rapporteures formulent 44 recommandations pour transformer les politiques d'accompagnement à la parentalité, axées sur l'égalité :

      Éducation et Information :

      • Mettre en place des "cours d’activités domestiques" à l'école ou au collège pour inculquer des compétences à tous les enfants.
      • Lancer des "campagnes nationales contre les stéréotypes de genre" sur la parentalité.
      • Adopter une "terminologie neutre" (ex: "école pré-élémentaire" au lieu d'"école maternelle", "prestation pour naissance et soin du mineur" au lieu de "congé maternité/paternité").
      • Renforcer l'information des parents sur les dispositifs d'accompagnement.

      Prise en Compte au Travail :

      • Intégrer la parentalité dans la "responsabilité sociétale des entreprises (RSE)" et généraliser la "Charte de la parentalité" aux entreprises de plus de 50 salariés.
      • Modifier le Code du travail pour inclure explicitement la parentalité dans les "négociations d’entreprises relatives à l’égalité professionnelle".
      • Intégrer des critères sur la parentalité dans le futur "index égalité professionnelle".
      • Accorder des "autorisations d’absence" (4 demi-journées/an) aux parents pour les moments clés de la scolarité de leurs enfants.
      • Accompagnement dès le Désir d'Enfant et Post-Partum :
      • Élargir les "consultations pré-conceptionnelles" au projet parental et permettre au second parent d'assister à tous les rendez-vous médicaux obligatoires de la grossesse.
      • Consacrer une séance de préparation à la naissance au "projet parental".
      • Renforcer le dispositif d'arrêt en cas d'interruption de grossesse et l'étendre aux interruptions volontaires, avec une autorisation d'absence pour le conjoint.
      • Faciliter l'accès aux "consultations sur l’allaitement" et renforcer la "formation des praticiens sur la dépression post-partum".
      • Prévoir une "consultation facultative et remboursée à 100% avec un psychologue" pour les mères dans les trois mois après la naissance.
      • Étendre le "congé de 'proche aidant'" au second parent en soutien à la mère souffrant de dépression post-partum.
      • Lutter contre l'isolement des mères en proposant aux parents "d’être mis en relation avec d’autres parents accueillant leur enfant au même moment".

      Réforme des Congés et Modes d'Accueil :

      • Congé Paternité : Porter "progressivement le congé paternité à seize semaines, soit à égalité avec le congé maternité". Huit semaines seraient obligatoires (4 à la naissance, 4 après le congé maternité de la mère) et huit facultatives et fractionnables. Cette mesure est un "levier clé pour l’égalité entre les parents" et répond à l'aspiration des pères à s'investir davantage.
      • Congé Parental : Réformer le congé parental en "renforçant son attractivité financière sans réduire sa portée pour les ménages modestes", et réfléchir à une "reprise progressive" après le congé.
      • Modes d'Accueil : Garantir la "lisibilité et la transparence" des modes de garde, "investir pour augmenter et harmoniser l’offre de crèches sur le territoire", et "revaloriser les métiers de la petite enfance".
      • Soutien aux Parents d'Adolescents :
      • Élargir les missions des "lieux d’accueil enfants-parents" pour qu’ils puissent "recevoir des adolescents".
      • Mettre en place des "politiques publiques ciblant spécifiquement les parents d’adolescents".
      • Renforcer l’offre en "pédopsychiatrie" et la "médecine scolaire" face à la dégradation de la santé mentale des jeunes.
      • Lancer une "campagne d’information nationale sur la santé mentale des enfants et des adolescents".

      Accompagnement des Familles Monoparentales :

      • Repenser le "mode de calcul des pensions alimentaires" pour prendre en compte le coût réel de l’entretien d’un enfant et permettre au parent bénéficiaire de "défiscaliser la pension alimentaire".
      • Instaurer un "abattement sur le montant de la pension alimentaire pris en compte dans les bases ressources des prestations familiales et des aides au logement, à hauteur de l’allocation de soutien familial (ASF)".
      • "Déconjugaliser le versement de l’ASF" et "ouvrir les allocations logement (APL) aux deux parents" pour faciliter l’accueil des enfants.
      • Ouvrir aux mères solos la "possibilité de transférer des droits de congés vers un proche de leur choix" et "doubler les jours 'enfant malade'".
      • Étudier la création d'un "statut des familles monoparentales" avec des droits spécifiques.

      Conclusion des Rapporteures

      Les rapporteures affirment que malgré des évolutions, les mères restent le "parent principal", ce qui a des conséquences négatives sur leur santé et leur vie professionnelle.

      Une "réforme ambitieuse du système des congés", en particulier du congé second parent, est un "moteur d'égalité" essentiel.

      S'inspirant des modèles scandinaves et espagnols, la France peut avancer vers une parentalité égalitaire, non seulement pour l'émancipation des femmes, mais aussi comme réponse aux inquiétudes démographiques.

    1. Reviewer #2 (Public review):

      Summary:

      The authors present a software package "aTrack" for identification of motion types and parameter estimation in single-particle tracking data. The software is based on maximum likelihood estimation of the time-series data given an assumed motion model and likelihood ratio tests for model selection. They characterized the performance of the software mostly on simulated data and showed that it is applicable to experimental data.

      Strengths:

      Although many tools exist in the single-particle tracking (SPT) field, this particular software package is developed using an innovative mathematical model and a probabilistic approach. It also provide inference of motion types, which are critical to answer biological questions in SPT experiments.

      (1) The authors adopt a novel mathematical framework, which is unique in the SPT field.

      (2) The authors have validated their method extensively using simulated tracks and compared to existing methods when appropriate.

      (3) The code is freely available

      Weaknesses:

      The authors did a good job during the revision to address most of the weaknesses in my (as well as other reviewer's) first round of review. Nevertheless, the following issue is still not fully addressed.<br /> The hypothesis testing method presented here lacks rigorous statistical foundation. The authors improved on this point after the revision, but in their newly added SI section "Statistical Test", only justified their choices using "hand-waving" arguments (i.e. there is not a single reference to proper statistical textbooks or earlier works in this important section). I understand that sometimes mathematical rigor comes later after some intuition-guided choices of critical parameters seems to work, but nevertheless need to point it out as a remaining weakness.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      Weiss and co-authors presented a versatile probabilistic tool. aTrack helps in classifying tracking behaviors and understanding important parameters for different types of single particle motion types: Brownian, Confined, or Directed motion. The tool can be used further to analyze populations of tracks and the number of motion states. This is a stand-alone software package, making it user-friendly for a broad group of researchers. 

      Strengths: 

      This manuscript presents a novel method for trajectory analysis. 

      Weaknesses: 

      (1) In the results section, is there any reason to choose the specific range of track length for determining the type of motion? The starting value is fine, and would be short enough, but do the authors have anything to report about how much is too long for the model? 

      We chose to test the range of track lengths (five-to-hundreds of steps) to cover the broad range of scenarios arising from single proteins or fluorophores to brighter objects with more labels.  While there is no upper-limit per se, the computation time of our method scales linearly with track length, 100 time-points takes ~2 minutes to run on a standard consumer-level desktop CPU. We have added the following sentence to note the time-cost with trajectory length:  

      “The recurrent formula enables our model computation time to scale linearly with the number of time points.”

      (2) Robustness to model mismatches is a very important section that the authors have uplifted diligently. Understanding where and how the model is limited is important. For example, the authors mentioned the limitation of trajectory length, do the authors have any information on the trajectory length range at which this method works accurately? This would be of interest to readers who would like to apply this method to their own data. 

      We agree that limitations are important to estimate, and trajectory length is an important consideration when choosing how to analyze a dataset. We report the categorization certainty, i.e. the likelihood differences, for a range of track lengths (Fig. 2 a,c, Fig. 3c-d, and Fig. 4 c,g.).

      For example, here are the key plots from Fig. 2 quantifying the relative likelihoods, where being within the light region is necessary. The light areas represent a useful likelihood ratio.

      We only performed analysis up to track lengths of 600 time steps but parameter estimations and significance can only improve when increasing the track length as long as the model assumptions are verified. The broader limitations and future opportunities for new methods are now expanded upon in the discussion, for example switching between states and model and state and model ambiguities (bound vs very slow diffusion vs very slow motion).

      (3) aTrack extracts certain parameters from the trajectories to determine the motion types. However, it is not very clear how certain parameters are calculated. For example, is the diffusion coefficient D calculated from fitting, and how is the confinement factor defined and estimated, with equations? This information will help the readers to understand the principles of this algorithm.

      We apologize for the confusion. All the model parameters are fit using the maximum likelihood approach. To make this point clearer in the manuscript, we have made three changes:

      (1) We modified the following sentence to replace “determined” with "fit”:

      “Finally, Maximum Likelihood Estimation (MLE) is used to fit the underlying parameter value”

      (2) We added the following sentence in the main text :

      “In our model, the velocity is the characteristic parameter of directed motion and the confinement factor represents the force within a potential well. More precisely, the confinement factor $l$ is defined such that at each time step the particle position is updated by $l$ times the distance particle/potential well center (see the Methods section for more details).”.

      (3) We have added a new section in the methods, called Fitting Method, where we have added the explanation below:

      “For the pure Brownian model, the parameters are the diffusion coefficient and the localization error. For the confinement model, the parameters are the diffusion coefficient, the localization error, confinement factor, and the diffusion coefficientof the potential well. For the directed model, the parameters are the diffusion coefficient, the localization error, the initial velocity and the acceleration variance.

      These parameters are estimated using the maximum likelihood approach which consists in finding the parameters that maximize the likelihood. We realize this fitting step using gradient descent via a TensorFlow model. All the estimates presented in this article are obtained from a single set of initial parameters to demonstrate that the convergence capacity of aTrack is robust to the initial parameter values.”

      (4) The authors mentioned the scenario where a particle may experience several types of motion simultaneously. How do these motions simulated and what do they mean in terms of motion types? Are they mixed motion (a particle switches motion types in the same trajectory) or do they simply present features of several motion types? It is not intuitive to the readers that a particle can be diffusive (Brownian) and direct at the same time. 

      In the text, we present an example where one can observe this type of motion to help the reader understand when this type of motion can be met: “Sometimes, particles undergo diffusion and directed motion simultaneously, for example, particles diffusing in a flowing medium (Qian 1991).”

      This is simulated by the addition of two terms affecting the hidden position variable before adding a localization term to create the observed variable. In the analysis, this manifests as non-zero values for the diffusion coefficient and the linear velocity. For example, Figure 4g and the associated text, where a single particle moves with a directed component and a Brownian diffusion component at each step.

      We did not simulate transitions between types of motion. Switching is not treated by this current model; however, this limitation is described in the discussion and our team and others are currently working on addressing this challenge.

      Reviewer #2 (Public Review): 

      Summary: 

      The authors present a software package "aTrack" for identification of motion types and parameter estimation in single-particle tracking data. The software is based on maximum likelihood estimation of the time-series data given an assumed motion model and likelihood ratio tests for model selection. They characterized the performance of the software mostly on simulated data and showed that it is applicable to experimental data. 

      Strengths: 

      A potential advantage of the presented method is its wide applicability to different motion types. 

      Weaknesses: 

      (1) There has been a lot of similar work in this field. Even though the authors included many relevant citations in the introduction, it is still not clear what this work uniquely offers. Is it the first time that direct MLE of the time-series data was developed? Suggestions to improve would include (a) better wording in the introduction section, (b) comparing to other popular methods (based on MSD, step-size statistics (Spot-On, eLife 2018;7:e33125), for example) using the simulated dataset generated by the authors, (c) comparing to other methods using data set in challenges/competitions (Nat. Comm (2021) 12:6253).  

      We thank the reviewer for this suggestion and agree that the explanation of the innovative aspects of our method in the introduction was not clear enough. We have now modified the introduction to better explain what is improved here compared to previous approaches.

      “The main innovations of this model are: 1) it uses analytical recurrence formulas to perform the integration step for complex motion, improving speed and accuracy; 2) it handles both confined and directed motion; 3) anomalous parameters, such as the center of the potential well and the velocity vector are allowed to change through time to better represent tracks with changing directed motion or confinement area; and lastly 4) for a given track or set of tracks, aTrack can determine whether tracks can be statistically categorized as confined or directed, and the parameters that best describe their behavior, for example, diffusion coefficient, radius of confinement, and speed of directed motion.”

      Regarding alternatives, we compare our method in the text to the best-performing algorithm of the

      2021 Anomalous Diffusion (AnDi) Challenge challenge mentioned by the reviewer in Figure 6 (RANDI, Argun et al, arXiv, 2021, Muñoz-Gil et al, Nat Com. 2021). Notably, both methods performed similarly on fBm, but ours was more robust in cases where there were small differences between the process underlying the data and the model assumptions, a likely scenario in real datasets. Regarding Spot-On, this was not mentioned as it only deals with multiple populations of Brownian diffusers, preventing a quantitative comparison.

      (2) The Hypothesis testing method presented here has a number of issues: first, there is no definition of testing statistics. Usually, the testing statistics are defined given a specific (Type I and/or Type II) error rate. There is also no discussion of the specificity and sensitivity of the testing results (i.e. what's the probability of misidentification of a Brownian trajectory as directed? etc).

      We now explain our statistical approach and how to perform hypothesis testing with our metric in a new supplementary section, Statistical test. 

      We use the likelihood ratio as a more conservative alternative to the p-value. In Fig S2, we show that our metric is an upper bound of the p-value and can be used to perform hypothesis testing with a chosen type I error rate. 

      Related, it is not clear what Figure 2e (and other similar plots) means, as the likelihood ratio is small throughout the parameter space. Also, for likelihood ratio tests, the authors need to discuss how model complexity affects the testing outcome (as more complex models tend to be more "likely" for the data) and also how the likelihood function is normalized (normalization is not an issue for MLE but critical for ratio tests). 

      We present the likelihood ratio as an upper bound of the p-value. Therefore, we can reject the null hypothesis if it is smaller than a given threshold, e.g. 0.05, but this number should be decreased if multiple tests are performed. The colorscale we show in the figure is meant to highlight the working range (light), and ambiguous range (dark) of the method.

      As the reviewer mentions, we expect the alternative hypothesis to result in higher likelihoods than the simpler null hypothesis for null hypothesis tracks, but, as seen in the Fig S2, the likelihood ratio of a dataset corresponding to the null hypothesis is strongly skewed toward its upper limit 1. This means that for most of the tracks, the likelihood is not (or little) affected by the model complexity. The likelihoods of all the models are normalized so their integrals over the data equals 1/A with A the area of the field of view which is independent of the model complexity.

      (3) Relating to the mathematical foundation (Figure 1b). The measured positions are drawn as direct arrows from the real position states: this infers instantaneous localization. In reality, there is motion blur which introduces a correlation of the measured locations. Motion blur is known to introduce bias in SPT analysis, how does it affect the method here? 

      The reviewer raises an important point as our model does not explicitly consider motion blur. We have now added a paragraph that presents how our model performs in case of motion blur in the section called Robustness to model mismatches. This section and the corresponding new Supplemental Fig. S7 demonstrate that the estimated diffusion length is accurate so long as the static localization error is higher than the dynamic localization error. If the dynamic localization error is higher, our model systematically underestimates the diffusion length by a factor 0.81 = (2/3)<sup>0.5</sup> which can be corrected for with an added post-processing step.  

      (4) The authors did not go through the interpretation of the figure. This may be a matter of style, but I find the figures ambiguous to interpret at times.  

      We thank the reviewer for their feedback on improving the readability. To avoid overly repetitive and lengthy sections of text, we have opted for a concise approach. This allows us to present closely related panels at the same point in the text, while not ignoring important variations and tests. Considering this feedback and the reviewers, we have added more information and interpretation throughout our manuscript to improve interpretability.

      (5) It is not clear to me how the classification of the 5 motion types was accomplished. 

      We have modified the specific text related to this figure to describe an illustrative example to show how one could use aTrack on a dataset where not that much is known: First, we present the method to determine the number of states; second, we verify the parameter estimates correspond to the different states.  

      Classifying individual tracks is possible. While not done in the section corresponding to Fig. 5, this is done in Fig. 7 and a new supplementary plot, Fig. S9b (shown below). In brief, this is accomplished with our method by computing the likelihood of each track given each state. The probability that a given track is in state k equals the likelihood of the track given the state divided by the sum of the likelihoods given the different states. 

      (6) Figure 3. Caption: what is ((d_{est}-0.1)/0.1)? Also panel labeled as "d" should be "e". 

      Thank you for bringing these errors to our attention, the panel and caption have been corrected.

      Reviewer #3 (Public Review): 

      Summary: 

      In this work, Simon et al present a new computational tool to assess non-Brownian single-particle dynamics (aTrack). The authors provide a solid groundwork to determine the motion type of single trajectories via an analytical integration of multiple hidden variables, specifically accounting for localization uncertainty, directed/confined motion parameters, and, very novel, allowing for the evolution of the directed/confined motion parameters over time. This last step is, to the best of my knowledge, conceptually new and could prove very useful for the field in the future. The authors then use this groundwork to determine the motion type and its corresponding parameter values via a series of likelihood tests. This accounts for obtaining the motion type which is statistically most likely to be occurring (with Brownian motion as null hypothesis). Throughout the manuscript, aTrack is rigorously tested, and the limits of the methods are fully explored and clearly visualised. The authors conclude with allowing the characterization of multiple states in a single experiment with good accuracy and explore this in various experimental settings. Overall, the method is fundamentally strong, wellcharacterised, and tested, and will be of general interest to the single-particle-tracking field. 

      Strengths: 

      (1) The use of likelihood ratios gives a strong statistical relevance to the methodology. There is a sharp decrease in likelihood ratio between e.g. confinement of 0.00 and 0.05 and velocity of 0.0 and 0.002 (figure 2c), which clearly shows the strength of the method - being able to determine 2nm/timepoint directed movement with 20 nm loc. error and 100 nm/timepoint diffusion is very impressive. 

      We apologize for the confusion, the directed tracks in Fig 2 have no Brownian-motion component, i.e. D=0. We have made this clearer in the main text. Specifically, this section of the text refers to a track in linear motion with 2 nm displacements per step. With 70 time points (69 steps), a single particle which moved from 138 nm with a localization error of 20 nm (95% uncertainty range of 80 nm) can be statistically distinguished from slow diffusive motion.

      In Fig. 4g, we explore the capabilities of our method to detect if a diffusive particle also has a directed motion component. 

      (2) Allowing the hidden variables of confinement and directed motion to change during a trajectory (i.e. the q factor) is very interesting and allows for new interpretations of data. The quantifications of these variables are, to me, surprisingly accurate, but well-determined. 

      (3) The software is well-documented, easy to install, and easy to use. 

      Weaknesses: 

      (1) The aTrack principle is limited to the motions incorporated by the authors, with, as far as I can see, no way to add new analytical non-Brownian motion. For instance, being able to add a dynamical stateswitching model (i.e. quick on/off switching between mobile and non-mobile, for instance, repeatable DNA binding of a protein), could be of interest. I don't believe this necessarily has to be incorporated by the authors, but it might be of interest to provide instructions on how to expand aTrack.  

      We agree that handling dynamic state switching is very useful and highlight this potential future direction in the discussion. The revised text reads:

      “An important limitation of our approach is that it presumes that a given track follows a unique underlying model with fixed parameters. In biological systems, particles often transition from one motion type to another; for example, a diffusive particle can bind to a static substrate or molecular motor (46). In such cases, or in cases of significant mislinkings, our model is not suitable. However, this limitation can be alleviated by implicitly allowing state transitions with a hidden Markov Model (15) or alternatives such as change-point approaches (30, 47, 48), and spatial approaches (49).”

      (2) The experimental data does not very convincingly show the usefulness of aTrack. The authors mention that SPBs are directed in mitosis and not in interphase. This can be quantified and studied by microscopy analysis of individual cells and confirming the aTrack direction model based on this, but this is not performed. Similarly, the size of a confinement spot in optical tweezers can be changed by changing the power of the optical tweezer, and this would far more strongly show the quantitative power of aTrack. 

      We agree with the reviewer and have revised the biological experiment section significantly to better illustrate the potential of aTrack in various use cases.

      Now, we show an experiment to quantify the effect of LatA, an actin inhibitor, on the fraction of directed tracks obtained with aTrack. We find that LatA significantly decreases directed motion while a LatA-resistant mutant is not affected (Fig7a-c).

      As suggested by the reviewer, we have expanded the optical tweezer experiment by varying the laser power. As expected, increasing the laser power decreases the confinement radius.

      (3) The software has a very strict limit on the number of data points per trajectory, which is a user input. Shorter trajectories are discarded, while longer trajectories are cut off to the set length. It is not explained why this is necessary, and I feel it deletes a lot of useful data without clear benefit (in experimental conditions).

      We thank the reviewer for this recommendation; we have now modified the architecture of our model to enable users to consider tracks of multiple lengths. Note that the computation time is proportional to the longest track length times the number of tracks.  

      Reviewer #2 (Recommendations For The Authors): 

      Develop a better mathematical foundation for the likelihood ratio tests. 

      We added more explanation of the likelihood ratio tests and their interpretation a new section entitled Statistical test in the supplementary information to address this recommendation.

      Place this work in clearer contexts. 

      We have now revised the introduction to better contextualize this work.

      Improve manuscript clarity. 

      Based on reviewer feedback and input from others, we have addressed this point throughout the article to improve readability.

      Make the code available. 

      The code is available on https://github.com/FrancoisSimon/aTrack, now including code for track generation.

      Reviewer #3 (Recommendations For The Authors): 

      (1) I believe the underlying model presented in Figure 1 is of substantial impact, especially when considering it as a simulation tool. I would suggest the authors make their method also available as a simulator (as far as I can tell, this is not explicitly done in their code repository, although logically the code required for the simulator should already be in the codebase somewhere). 

      Thank you for this suggestion, the simulation scripts are now on the Github repository together with the rest of the analysis method. https://github.com/FrancoisSimon/aTrack

      (2) The authors should explore and/or discuss the effects of wrong trajectory linking to their method. Throughout the text, fully correct trajectory linking is assumed and assessed, while in real experiments, it is often the case that trajectory linking is wrong, e.g. due to blinking emitters, imaging artefacts, high-density localizations, etc etc. This would have a major impact on the accuracy of trajectories, and it is extremely relevant to explore how this is translated to the output of aTrack. 

      As the reviewer notes, our current model does not account for track mislinking. This limits the method to data with lower fluorophore-densities, which is the typical use-case for SPT. We have added a brief description of the issue into the discussion of limitations.  

      (3) aTrack only supports 2D-tracking, but I don't believe there is a conceptual reason not to have this expanded to three dimensions. 

      The stand-alone software is currently limited to 2D tracks, however, the aTrack Python package works for any number of dimensions (i.e. 1-3). Note that since the current implementation assumes a single localization error for all axes, more modifications may be required for some types of 3D tracking. See https://github.com/FrancoisSimon/aTrack for more details about aTrack implementations.

      (4) Crucial information is missing in the experimental demonstrations. Especially in the NP-bacteria dataset, I miss scalebars, and information on the number of tracks. It is not explained why 5 different states are obtained - especially because I would naively expect three states: immobile NPs (e.g. stuck to glass), diffusing NPs, and NPs attached to bacteria, and thus directed. Figure 7e shows three diffusive states (why more than one?), no immobile states (why?), and two directed states (why?). 

      We thank the reviewer for pointing out these issues. We have now added scalebars and more experimental details to the figure and text as well as modifying the plot to more clearly emphasize the directed nanoparticles that are attached to cells from the diffusive nanoparticles.  

      Likely, our focal plane was too high to see the particles stuck on glass. The multiple diffusive states may be caused by different sizes of nanoparticle complexes, the multiple directed states can be caused by the fact that directed motion of the cell-attached-nanoparticles occasionally shows drastic changes of orientations. We have also clarified in the text how multiple states can help handle a heterogeneous population as was shown by Prindle et al. 2022, Microbiol Spectr. The characterization and phenotyping of microbial populations by nanoparticle tracking was published in Zapata et al. 2022, Nanoscale. 

      (5) I don't think I agree that 'robustness to model mismatches' is a good thing. Very crudely, the fact that aTrack finds fractional Brownian motion to be normal Brownian motion is technically a downside - and this should be especially carefully positioned if (in the future) a fractional Brownian motion model would be added to aTrack. I think that the author's point can be better tested by e.g. widely varying simulated vs fitted loc precision/diffusion coefficient (which are somewhat interchangeable).

      In this context, our intention in describing the robustness to “model mismatches” refers to classifying subdiffusion as subdiffusive irrespective of the exact subdiffusion motion physics (as well as superdiffusion), that is, to use aTrack how MSD analysis is often deployed. This is important in the context of real-world applications where simple mathematical models cannot perfectly represent real tracks with greater complexity. 

      Inevitably, some fraction of tracks with a pure Brownian motion may appear to match with a fractional Brownian motion, and thus statistical tests are needed to determine if this is significant. In general, aTrack finds fBm to be normal Brownian motion only when the anomalous coefficient is near 1, i.e. when the two models are indeed the same. When analysing fBm tracks with anomalous coefficients of 0.5 or 1.5, aTrack find that these tracks are better explained by our confined diffusion model or directed motion model, respectively (Please see Fig. 6a, copied below). 

      To better clarify our objective, the section now has a brief introduction that reads:

      “One of the most important features of a method is its robustness to deviations from its assumptions. Indeed, experimental tracking data will inevitably not match the model assumptions to some degree, and models need to be resilient to these small deviations.”  

      Smaller points: 

      (1) It is not clear what a biological example is of rotational diffusion. 

      We modified the text to better explain the use of rotational diffusion.

      (2) The text in the section on experimental data should be expanded and clarified, there currently are multiple 'floating sentences' that stop halfway, and it does not clearly describe the biological relevance and observed findings.  

      We thank the reviewer for pointing out this issue. We have reworked the experimental section to better and more clearly explain the biological relevance of the findings.

      (3) Caption of figure 3: 'd' should be 'e'. 

      (4) Caption of Figure 7: log-likelihood should be Lconfined - Lbrownian, I believe. 

      (5) Equation number missing in SI first sentence. 

      (6) Supplementary Figure 1 top part access should be Lc-Lb instead of Ld-Lb. 

      We have made these corrections, thank you for bringing them to our attention.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Parise presents another instantiation of the Multisensory Correlation Detector model that can now accept stimulus-level inputs. This is a valuable development as it removes researcher involvement in the characterization/labeling of features and allows analysis of complex stimuli with a high degree of nuance that was previously unconsidered (i.e., spatial/spectral distributions across time). The author demonstrates the power of the model by fitting data from dozens of previous experiments, including multiple species, tasks, behavioral modalities, and pharmacological interventions.

      Thanks for the kind words!

      Strengths:

      One of the model's biggest strengths, in my opinion, is its ability to extract complex spatiotemporal co-relationships from multisensory stimuli. These relationships have typically been manually computed or assigned based on stimulus condition and often distilled to a single dimension or even a single number (e.g., "-50 ms asynchrony"). Thus, many models of multisensory integration depend heavily on human preprocessing of stimuli, and these models miss out on complex dynamics of stimuli; the lead modality distribution apparent in Figures 3b and c is provocative. I can imagine the model revealing interesting characteristics of the facial distribution of correlation during continuous audiovisual speech that have up to this point been largely described as "present" and almost solely focused on the lip area.

      Another aspect that makes the MCD stand out among other models is the biological inspiration and generalizability across domains. The model was developed to describe a separate process - motion perception - and in a much simpler organism - Drosophila. It could then describe a very basic neural computation that has been conserved across phylogeny (which is further demonstrated in the ability to predict rat, primate, and human data) and brain area. This aspect makes the model likely able to account for much more than what has already been demonstrated with only a few tweaks akin to the modifications described in this and previous articles from Parise.

      What allows this potential is that, as Parise and colleagues have demonstrated in those papers since our (re)introduction of the model in 2016, the MCD model is modular - both in its ability to interface with different inputs/outputs and its ability to chain MCD units in a way that can analyze spatial, spectral, or any other arbitrary dimension of a stimulus. This fact leaves wide open the possibilities for types of data, stimuli, and tasks a simplistic, neutrally inspired model can account for.

      And so it's unsurprising (but impressive!) that Parise has demonstrated the model's ability here to account for such a wide range of empirical data from numerous tasks (synchrony/temporal order judgement, localization, detection, etc.) and behavior types (manual/saccade responses, gaze, etc.) using only the stimulus and a few free parameters. This ability is another of the model's main strengths that I think deserves some emphasis: it represents a kind of validation of those experiments, especially in the context of cross-experiment predictions (but see some criticism of that below).

      Finally, what is perhaps most impressive to me is that the MCD (and the accompanying decision model) does all this with very few (sometimes zero) free parameters. This highlights the utility of the model and the plausibility of its underlying architecture, but also helps to prevent extreme overfitting if fit correctly (but see a related concern below).

      We sincerely thank the reviewer for their thoughtful and generous comments. We are especially pleased that the core strengths of the model—its stimulus-computable architecture, biological grounding, modularity, and cross-domain applicability—were clearly recognized. As the reviewer rightly notes, removing researcher-defined abstractions and working directly from naturalistic stimuli opens the door to uncovering previously overlooked dynamics in complex multisensory signals, such as the spatial and temporal richness of audiovisual speech.

      We also appreciate the recognition of the model’s origins in a simple organism and its generalization across species and behaviors. This phylogenetic continuity reinforces our view that the MCD captures a fundamental computation with wide-ranging implications. Finally, we are grateful for the reviewer’s emphasis on the model’s predictive power across tasks and datasets with few or no free parameters—a property we see as key to both its parsimony and explanatory utility.

      We have highlighted these points more explicitly in the revised manuscript, and we thank the reviewer for their generous and insightful endorsement of the work.

      Weaknesses:

      There is an insufficient level of detail in the methods about model fitting. As a result, it's unclear what data the models were fitted and validated on. Were models fit individually or on average group data? Each condition separately? Is the model predictive of unseen data? Was the model cross-validated? Relatedly, the manuscript mentions a randomization test, but the shuffled data produces model responses that are still highly correlated to behavior despite shuffling. Could it be that any stimulus that varies in AV onset asynchrony can produce a psychometric curve that matches any other task with asynchrony judgements baked into the task? Does this mean all SJ or TOJ tasks produce correlated psychometric curves? Or more generally, is Pearson's correlation insensitive to subtle changes here, considering psychometric curves are typically sigmoidal? Curves can be non-overlapping and still highly correlated if one is, for example, scaled differently. Would an error term such as mean-squared or root mean-squared error be more sensitive to subtle changes in psychometric curves? Alternatively, perhaps if the models aren't cross-validated, the high correlation values are due to overfitting?

      The reviewer is right: the current version of the manuscript only provides limited information about parameter fitting. In the revised version of the manuscript, we included a parameter estimation and generalizability section that includes all information requested by the reviewer.

      To test whether using the MSE instead of Pearson correlation led to a similar estimated set of parameter values, we repeated the fitting using the MSE. The parameter estimated with this method (TauV, TauA, TauBim) closely followed those estimated using Pearson correlation (TauV, TauA, TauBim). Given the similarity of these results, we have chosen not to include further figures, however this analysis is now included in the new section (pages 23-24).

      Regarding the permutation test, it is expected that different stimuli produce analogous psychometric functions: after all, all studies relied on stimuli containing identical manipulation of lags. As a result, MCD population responses tend to be similar across experiments. Therefore, it is not a surprise that the permuted distribution of MCD-data correlation in Supplementary Figure 1K has a mean as high as 0.97. However, what is important is to demonstrate that the non-permuted dataset has an even higher goodness of fit. Supplementary Figure 1K demonstrates that none of the permuted stimuli could outperform the non-permuted dataset; the mean of the non-permuted distribution is 4.7 (standard deviations) above the mean of the already high  permuted distribution.

      We believe the new section, along with the present response, fully addresses the legitimate concerns of the reviewer.

      While the model boasts incredible versatility across tasks and stimulus configurations, fitting behavioral data well doesn't mean we've captured the underlying neural processes, and thus, we need to be careful when interpreting results. For example, the model produces temporal parameters fitting rat behavior that are 4x faster than when fitting human data. This difference in slope and a difference at the tails were interpreted as differences in perceptual sensitivity related to general processing speeds of the rat, presumably related to brain/body size differences. While rats no doubt have these differences in neural processing speed/integration windows, it seems reasonable that a lot of the differences in human and rat psychometric functions could be explained by the (over)training and motivation of rats to perform on every trial for a reward - increasing attention/sensitivity (slope) - and a tendency to make mistakes (compression evident at the tails). Was there an attempt to fit these data with a lapse parameter built into the decisional model as was done in Equation 21? Likewise, the fitted parameters for the pharmacological manipulations during the SJ task indicated differences in the decisional (but not the perceptual) process and the article makes the claim that "all pharmacologically-induced changes in audiovisual time perception" can be attributed to decisional processes "with no need to postulate changes in low-level temporal processing." However, those papers discuss actual sensory effects of pharmacological manipulation, with one specifically reporting changes to response timing. Moreover, and again contrary to the conclusions drawn from model fits to those data, both papers also report a change in psychometric slope/JND in the TOJ task after pharmacological manipulation, which would presumably be reflected in changes to the perceptual (but not the decisional) parameters.

      Fitting or predicting behaviour does not in itself demonstrate that a model captures the underlying neural computations—though it may offer valuable constraints and insights. In line with this, we were careful not to extrapolate the implications of our simulations to specific neural mechanisms.

      Temporal sensitivity is, by definition, a behavioural metric, and—as the reviewer correctly notes—its estimation may reflect a range of contributing factors beyond low-level sensory processing, including attention, motivation, and lapse rates (i.e., stimulus-independent errors). In Equation 21, we introduced a lapse parameter specifically to account for such effects in the context of monkey eye-tracking data. For the rat datasets, however, the inclusion of a lapse term was not required to achieve a close fit to the psychometric data (ρ = 0.981). While it is likely that adding a lapse component would yield a marginally better fit, the absence of single-trial data prevents us from applying model comparison criteria such as AIC or BIC to justify the additional parameter. In light of this, and to avoid unnecessary model complexity, we opted not to include a lapse term in the rat simulations.

      With respect to the pharmacological manipulation data, we acknowledge the reviewer’s point that observed changes in slope and bias could plausibly arise from alterations at either the sensory or decisional level—or both. In our model, low-level sensory processing is instantiated by the MCD architecture, which outputs the MCDcorr and MCDlag signals that are then scaled and integrated during decision-making. Importantly, this scaling operation influences the slope of the resulting psychometric functions, such that changes in slope can arise even in the absence of any change to the MCD’s temporal filters. In our simulations, the temporal constants of the MCD units were fixed to the values estimated from the non-pharmacological condition (see parameter estimation section above), and only the decision-related parameters were allowed to vary. From this modelling perspective, the behavioural effects observed in the pharmacological datasets can be explained entirely by changes at the decisional level. However, we do not claim that such an explanation excludes the possibility of genuine sensory-level changes. Rather, we assert that our model can account for the observed data without requiring modifications to early temporal tuning.

      To rigorously distinguish sensory from decisional effects, future experiments will need to employ stimuli with richer temporal structure—e.g., temporally modulated sequences of clicks and flashes that vary in frequency, phase, rhythm, or regularity (see Fujisaki & Nishida, 2007; Denison et al., 2012; Parise & Ernst, 2016, 2025; Locke & Landy, 2017; Nidiffer et al., 2018). Such stimuli engage the MCD in a more stimulus-dependent manner, enabling a clearer separation between early sensory encoding and later decision-making processes. Unfortunately, the current rat datasets—based exclusively on single click-flash pairings—lack the complexity needed for such disambiguation. As a result, while our simulations suggest that the observed pharmacologically induced effects can be attributed to changes in decision-level parameters, they do not rule out concurrent sensory-level changes.

      In summary, our results indicate that changes in the temporal tuning of MCD units are not necessary to reproduce the observed pharmacological effects on audiovisual timing behaviour. However, we do not assert that such changes are absent or unnecessary in principle. Disentangling sensory and decisional contributions will ultimately require richer datasets and experimental paradigms designed specifically for this purpose. We have now modified the results section (page 6) and the discussion (page 11) to clarify these points.

      The case for the utility of a stimulus-computable model is convincing (as I mentioned above), but its framing as mission-critical for understanding multisensory perception is overstated, I think. The line for what is "stimulus computable" is arbitrary and doesn't seem to be followed in the paper. A strict definition might realistically require inputs to be, e.g., the patterns of light and sound waves available to our eyes and ears, while an even more strict definition might (unrealistically) require those stimuli to be physically present and transduced by the model. A reasonable looser definition might allow an "abstract and low-dimensional representation of the stimulus, such as the stimulus envelope (which was used in the paper), to be an input. Ultimately, some preprocessing of a stimulus does not necessarily confound interpretations about (multi)sensory perception. And on the flip side, the stimulus-computable aspect doesn't necessarily give the model supreme insight into perception. For example, the MCD model was "confused" by the stimuli used in our 2018 paper (Nidiffer et al., 2018; Parise & Ernst, 2025). In each of our stimuli (including catch trials), the onset and offset drove strong AV temporal correlations across all stimulus conditions (including catch trials), but were irrelevant to participants performing an amplitude modulation detection task. The to-be-detected amplitude modulations, set at individual thresholds, were not a salient aspect of the physical stimulus, and thus only marginally affected stimulus correlations. The model was of course, able to fit our data by "ignoring" the on/offsets (i.e., requiring human intervention), again highlighting that the model is tapping into a very basic and ubiquitous computational principle of (multi)sensory perception. But it does reveal a limitation of such a stimulus-computable model: that it is (so far) strictly bottom-up.

      We appreciate the reviewer’s thoughtful engagement with the concept of stimulus computability. We agree that the term requires careful definition and should not be taken as a guarantee of perceptual insight or neural plausibility. In our work, we define a model as “stimulus-computable” if all its inputs are derived directly from the stimulus, rather than from experimenter-defined summary descriptors such as temporal lag, spatial disparity, or cue reliability. In the context of multisensory integration, this implies that a model must account not only for how cues are combined, but also for how those cues are extracted from raw inputs—such as audio waveforms and visual contrast sequences.

      This distinction is central to our modelling philosophy. While ideal observer models often specify how information should be combined once identified, they typically do not address the upstream question of how this information is extracted from sensory input. In that sense, models that are not stimulus-computable leave out a key part of the perceptual pipeline. We do not present stimulus computability as a marker of theoretical superiority, but rather as a modelling constraint that is necessary if one’s aim is to explain how structured sensory input gives rise to perception. This is a view that is also explicitly acknowledged and supported by Reviewer 2.

      Framed in Marr’s (1982) terms, non–stimulus-computable models tend to operate at the computational level, defining what the system is doing (e.g., computing a maximum likelihood estimate), whereas stimulus-computable models aim to function at the algorithmic level, specifying how the relevant representations and operations might be implemented. When appropriately constrained by biological plausibility, such models may also inform hypotheses at the implementational level, pointing to potential neural substrates that could instantiate the computation.

      Regarding the reviewer’s example illustrating a limitation of the MCD model, we respectfully note that the account appears to be based on a misreading of our prior work. In Parise & Ernst (2025), where we simulated the stimuli from Nidiffer et al. (2018), the MCD model reproduced participants’ behavioural data without any human intervention or adjustment. The model was applied in a fully bottom-up, stimulus-driven manner, and its output aligned with observer responses as-is. We suspect the confusion may stem from analyses shown in Figure 6 - Supplement Figure 5 of Parise & Ernst (2025), where we investigated the lack of a frequency-doubling effect in the Nidiffer et al. data. However, those analyses were based solely on the Pearson correlation between auditory and visual stimulus envelopes and did not involve the MCD model. No manual exclusion of onset/offset events was applied, nor was the MCD used in those particular figures. We also note that Parise & Ernst (2025) is a separate, already published study and is not the manuscript currently under review. 

      In summary, while we fully agree that stimulus computability does not resolve all the complexities of multisensory perception (see comments below about speech), we maintain that it provides a valuable modelling constraint—one that enables robust, generalisable predictions when appropriately scoped. 

      The manuscript rightly chooses to focus a lot of the work on speech, fitting the MCD model to predict behavioral responses to speech. The range of findings from AV speech experiments that the MCD can account for is very convincing. Given the provided context that speech is "often claimed to be processed via dedicated mechanisms in the brain," a statement claiming a "first end-to-end account of multisensory perception," and findings that the MCD model can account for speech behaviors, it seems the reader is meant to infer that energetic correlation detection is a complete account of speech perception. I think this conclusion misses some facets of AV speech perception, such as integration of higher-order, non-redundant/correlated speech features (Campbell, 2008) and also the existence of top-down and predictive processing that aren't (yet!) explained by MCD. For example, one important benefit of AV speech is interactions on linguistic processes - how complementary sensitivity to articulatory features in the auditory and visual systems (Summerfield, 1987) allow constraint of linguistic processes (Peelle & Sommers, 2015; Tye-Murray et al., 2007).

      We thank the reviewer for their thoughtful comments, and especially for the kind words describing the range of findings from our AV speech simulations as “very convincing.”

      We would like to clarify that it is not our view that speech perception can be reduced to energetic correlation detection. While the MCD model captures low- to mid-level temporal dependencies between auditory and visual signals, we fully agree that a complete account of audiovisual speech perception must also include higher-order processes—including linguistic mechanisms and top-down predictions. These are critical components of AV speech comprehension, and lie beyond the scope of the current model.

      Our use of the term “end-to-end” is intended in a narrow operational sense: the model transforms raw audiovisual input (i.e., audio waveforms and video frames) directly into behavioural output (i.e., button press responses), without reliance on abstracted stimulus parameters such as lag, disparity or reliability. It is in this specific technical sense that the MCD offers an end-to-end model. We have revised the manuscript to clarify this usage to avoid any misunderstanding.

      In light of the reviewer’s valuable point, we have now edited the Discussion to acknowledge the importance of linguistic processes (page 13) and to clarify what we mean by end-to-end account (page 11). We agree that future work will need to explore how stimulus-computable models such as the MCD can be integrated with broader frameworks of linguistic and predictive processing (e.g., Summerfield, 1987; Campbell, 2008; Peelle & Sommers, 2015; Tye-Murray et al., 2007).

      References

      Campbell, R. (2008). The processing of audio-visual speech: empirical and neural bases. Philosophical Transactions of the Royal Society B: Biological Sciences, 363(1493), 1001-1010. https://doi.org/10.1098/rstb.2007.2155

      Nidiffer, A. R., Diederich, A., Ramachandran, R., & Wallace, M. T. (2018). Multisensory perception reflects individual differences in processing temporal correlations. Scientific Reports 2018 8:1, 8(1), 1-15. https://doi.org/10.1038/s41598-018-32673-y

      Parise, C. V, & Ernst, M. O. (2025). Multisensory integration operates on correlated input from unimodal transient channels. ELife, 12. https://doi.org/10.7554/ELIFE.90841

      Peelle, J. E., & Sommers, M. S. (2015). Prediction and constraint in audiovisual speech perception. Cortex, 68, 169-181. https://doi.org/10.1016/j.cortex.2015.03.006

      Summerfield, Q. (1987). Some preliminaries to a comprehensive account of audio-visual speech perception. In B. Dodd & R. Campbell (Eds.), Hearing by Eye: The Psychology of Lip-Reading (pp. 3-51). Lawrence Erlbaum Associates.

      Tye-Murray, N., Sommers, M., & Spehar, B. (2007). Auditory and Visual Lexical Neighborhoods in Audiovisual Speech Perception: Trends in Amplification, 11(4), 233-241. https://doi.org/10.1177/1084713807307409

      Reviewer #2 (Public review):

      Summary:

      Building on previous models of multisensory integration (including their earlier correlation-detection framework used for non-spatial signals), the author introduces a population-level Multisensory Correlation Detector (MCD) that processes raw auditory and visual data. Crucially, it does not rely on abstracted parameters, as is common in normative Bayesian models," but rather works directly on the stimulus itself (i.e., individual pixels and audio samples). By systematically testing the model against a range of experiments spanning human, monkey, and rat data, the authors show that their MCD population approach robustly predicts perception and behavior across species with a relatively small (0-4) number of free parameters.

      Strengths:

      (1) Unlike prior Bayesian models that used simplified or parameterized inputs, the model here is explicitly computable from full natural stimuli. This resolves a key gap in understanding how the brain might extract "time offsets" or "disparities" from continuously changing audio-visual streams.

      (2) The same population MCD architecture captures a remarkable range of multisensory phenomena, from classical illusions (McGurk, ventriloquism) and synchrony judgments, to attentional/gaze behavior driven by audio-visual salience. This generality strongly supports the idea that a single low-level computation (correlation detection) can underlie many distinct multisensory effects.

      (3) By tuning model parameters to different temporal rhythms (e.g., faster in rodents, slower in humans), the MCD explains cross-species perceptual data without reconfiguring the underlying architecture.

      We thank the reviewer for their positive evaluation of the manuscript, and particularly for highlighting the significance of the model's stimulus-computable architecture and its broad applicability across species and paradigms. Please find our responses to the individual points below.

      Weaknesses:

      (1) The authors show how a correlation-based model can account for the various multisensory integration effects observed in previous studies. However, a comparison of how the two accounts differ would shed light on the correlation model being an implementation of the Bayesian computations (different levels in Marr's hierarchy) or making testable predictions that can distinguish between the two frameworks. For example, how uncertainty in the cue combined estimate is also the harmonic mean of the unimodal uncertainties is a prediction from the Bayesian model. So, how the MCD framework predicts this reduced uncertainty could be one potential difference (or similarity) to the Bayesian model.

      We fully agree with the reviewer that a comparison between the correlation-based MCD model and Bayesian accounts is valuable—particularly for clarifying how the two frameworks differ conceptually and where they may converge.

      As noted in the revised manuscript, the key distinction lies in the level of analysis described by Marr (1982). Bayesian models operate at the computational level, describing what the system is aiming to compute (e.g., optimal cue integration). In contrast, the MCD functions at the algorithmic level, offering a biologically plausible mechanism for how such integration might emerge from stimulus-driven representations.

      In this context, the MCD provides a concrete, stimulus-grounded account of how perceptual estimates might be constructed—potentially implementing computations with Bayesian-like characteristics (e.g., reduced uncertainty, cue weighting). Thus, the two models are not mutually exclusive but can be seen as complementary: the MCD may offer an algorithmic instantiation of computations that, at the abstract level, resemble Bayesian inference.

      We have now updated the manuscript to explicitly highlight this relationship (pages 2 and 11). In the revised manuscript, we also included a new figure (Figure 5) and movie (Supplementary Movie 3), to show how the present approach extends previous Bayesian models for the case of cue integration (i.e., the ventriloquist effect).

      (2) The authors show a good match for cue combination involving 2 cues. While Bayesian accounts provide a direction for extension to more cues (also seen empirically, for eg, in Hecht et al. 2008), discussion on how the MCD model extends to more cues would benefit the readers.

      We thank the reviewer for this insightful comment: extending the MCD model to include more than two sensory modalities is a natural and valuable next step. Indeed, one of the strengths of the MCD framework lies in its modularity. Let us consider the MCDcorr​ output (Equation 6), which is computed as the pointwise product of transient inputs across modalities. Extending this to include a third modality, such as touch, is straightforward: MCD units would simply multiply the transient channels from all three modalities, effectively acting as trimodal coincidence detectors that respond when all inputs are aligned in time and space.

      By contrast, extending MCDlag is less intuitive, due to its reliance on opponency between two subunits (via subtraction). A plausible solution is to compute MCDlag in a pairwise fashion (e.g., AV, VT, AT), capturing relative timing across modality pairs.

      Importantly, the bulk of the spatial integration in our framework is carried by MCDcorr, which generalises naturally to more than two modalities. We have now formalised this extension and included a graphical representation in a supplementary section of the revised manuscript.

      Likely Impact and Usefulness:

      The work offers a compelling unification of multiple multisensory tasks- temporal order judgments, illusions, Bayesian causal inference, and overt visual attention - under a single, fully stimulus-driven framework. Its success with natural stimuli should interest computational neuroscientists, systems neuroscientists, and machine learning scientists. This paper thus makes an important contribution to the field by moving beyond minimalistic lab stimuli, illustrating how raw audio and video can be integrated using elementary correlation analyses.

      Reviewer #1 (Recommendations for the authors):

      Recommendations:

      My biggest concern is a lack of specificity about model fitting, which is assuaged by the inclusion of sufficient detail to replicate the analysis completely or the inclusion of the analysis code. The code availability indicates a script for the population model will be included, but it is unclear if this code will provide the fitting details for the whole of the analysis.

      We thank the reviewer for raising this important point. A new methodological section has been added to the manuscript, detailing the model fitting procedures used throughout the study. In addition, the accompanying code repository now includes MATLAB scripts that allow full replication of the spatiotemporal MCD simulations.

      Perhaps it could be enlightening to re-evaluate the model with a measure of error rather than correlation? And I think many researchers would be interested in the model's performance on unseen data.

      The model has now been re-evaluated using mean squared error (MSE), and the results remain consistent with those obtained using Pearson correlation. Additionally, we have clarified which parts of the study involve testing the model on unseen data (i.e., data not used to fit the temporal constants of the units). These analyses are now included and discussed in the revised fitting section of the manuscript (pages 23-24).

      Otherwise, my concerns involve the interpretation of findings, and thus could be satisfied with minor rewording or tempering conclusions.

      The manuscript has been revised to address these interpretative concerns, with several conclusions reworded or tempered accordingly. All changes are marked in blue in the revised version.

      Miscellanea:

      Should b0 in equation 10 be bcrit to match the below text?

      Thank you for catching this inconsistency. We have corrected Equation 10 (and also Equation 21) to use the more transparent notation bcrit instead of b0, in line with the accompanying text.

      Equation 23, should time be averaged separately? For example, if multiple people are speaking, the average correlation for those frames will be higher than the average correlation across all times.

      We thank the reviewer for raising this thoughtful and important point. In response, we have clarified the notation of Equation 23 in the revised manuscript (page 20). Specifically, we now denote the averaging operations explicitly as spatial means and standard deviations across all pixel locations within each frame.

      This equation computes the z-score of the MCD correlation value at the current gaze location, normalized relative to the spatial distribution of correlation values in the same frame. That is, all operations are performed at the frame level, not across time. This ensures that temporally distinct events are treated independently and that the final measure reflects relative salience within each moment, not a global average over the stimulus. In other words, the spatial distribution of MCD activity is re-centered and rescaled at each frame, exactly to avoid the type of inflation or confounding the reviewer rightly cautioned against.

      Reviewer #2 (Recommendations for the authors):

      The authors have done a great job of providing a stimulus computable model of cue combination. I had just a few suggestions to strengthen the theoretical part of the paper:

      (1) While the authors have shown a good match between MCD and cue combination, some theoretical justification or equivalence analysis would benefit readers on how the two relate to each other. Something like Zhang et al. 2019 (which is for motion cue combination) would add to the paper.

      We agree that it is important to clarify the theoretical relationship between the Multisensory Correlation Detector (MCD) and normative models of cue integration, such as Bayesian combination. In the revised manuscript, we have now modified the introduction and added a paragraph in the Discussion addressing this link more explicitly. In brief, we see the MCD as an algorithmic-level implementation (in Marr’s terms) that may approximate or instantiate aspects of Bayesian inference.

      (2) Simulating cue combination for tasks that require integration of more than two cues (visual, auditory, haptic cues) would more strongly relate the correlation model to Bayesian cue combination. If that is a lot of work, at least discussing this would benefit the paper

      This point has now been addressed, and a new paragraph discussing the extension of the MCD model to tasks involving more than two sensory modalities has been added to the Discussion section.

    1. So now, in many ways, humanities majors can produce some of the most interesting “code.”

      this is the "education" that Mollick suggests is best for most effective use of A.I.

    1. Broadly speaking, virtualization software is one member of a class that also includes emulation. Emulation, which involves simulating computer hardware in software, is typically used when the source CPU type is different from the target CPU type. For example, when Apple switched from the IBM Power CPU to the Intel x86 CPU for its desktop and laptop computers, it included an emulation facility called “Rosetta,” which allowed applications compiled for the IBM CPU to run on the Intel CPU. That same concept can be extended to allow an entire operating system written for one platform to run on another. Emulation comes at a heavy price, however. Every machine-level instruction that runs natively on the source system must be translated to the equivalent function on the target system, frequently resulting in several target instructions. If the source and target CPUs have similar performance levels, the emulated code may run much more slowly than the native code.

      This passage explains about the virtualization from the emulation, though both allow software designed for one system to run on another:

      Emulation: It simulates the hardware of one of the system in the software on another system.

      Use case: Running the software which is compiled for one of the CPU on the different CPU architecture (e.g., Apple’s Rosetta translating the PowerPC instructions for the Intel x86).

      Drawback: Performance overhead is said to be high because every source instruction must be translated into one or more of the target instructions, slowing execution compared to the native code.

      Virtualization vs. Emulation: Unlike the emulation, the virtualization typically runs on the same CPU architecture as the host, so the performance overhead is said to be lower. Emulation is very essential when the host and the guest architectures differ.

    2. A process is the unit of work in a system. A system consists of a collection of processes, some of which are operating-system processes (those that execute system code) and the rest of which are user processes (those that execute user code). All these processes can potentially execute concurrently—by multiplexing on a single CPU core—or in parallel across multiple CPU cores.

      A process serves as the fundamental unit of operation within a computer system. Systems execute multiple processes simultaneously, encompassing operating-system processes (that oversee the system) and user processes (that run user applications). On a single CPU, processes take turns quickly through multiplexing, whereas on multiple CPU cores, they can execute simultaneously, performing various tasks concurrently

    3. Since the operating system and its users share the hardware and software resources of the computer system, a properly designed operating system must ensure that an incorrect (or malicious) program cannot cause other programs—or the operating system itself—to execute incorrectly. In order to ensure the proper execution of the system, we must be able to distinguish between the execution of operating-system code and user-defined code. The approach taken by most computer systems is to provide hardware support that allows differentiation among various modes of execution.

      Since the operating system and the user applications are said to utilize the same computer, it is essential for the OS to safeguard itself and the other programs from the faulty or the harmful code. To achieve this, the majority of the systems are meant to utilize hardware-supported execution modes that distinguish operating-system code from user code. This guarantees that user applications cannot inadvertently—or deliberately—disturb the OS or other applications, maintaining the system's stability and also security

    4. In the early days of modern computing (that is, the 1950s), software generally came with source code. The original hackers (computer enthusiasts) at MIT's Tech Model Railroad Club left their programs in drawers for others to work on. “Homebrew” user groups exchanged code during their meetings. Company-specific user groups, such as Digital Equipment Corporation's DECUS, accepted contributions of source-code programs, collected them onto tapes, and distributed the tapes to interested members. In 1970, Digital's operating systems were distributed as source code with no restrictions or copyright notice. Computer and software companies eventually sought to limit the use of their software to authorized computers and paying customers. Releasing only the binary files compiled from the source code, rather than the source code itself, helped them to achieve this goal, as well as protecting their code and their ideas from their competitors. Although the Homebrew user groups of the 1970s exchanged code during their meetings, the operating systems for hobbyist machines (such as CPM) were proprietary. By 1980, proprietary software was the usual case

      I understand that in the early days of computing, software was something people freely shared so that everyone could learn from and improve it. Groups like MIT’s Tech Model Railroad Club and DECUS made coding a collaborative activity, and even big companies like Digital allowed open access to their operating systems. But later, especially in the 1970s, companies realized that software could be sold and also needed protection from competitors, so they started giving only binary files instead of source code. This change meant users could run the software but not see how it worked or modify it. By the 1980s, most software became proprietary, which shows how the focus shifted from open collaboration to business and profit.

    5. The free-software movement is driving legions of programmers to create thousands of open-source projects, including operating systems. Sites like http://freshmeat.net/ and http://distrowatch.com/ provide portals to many of these projects. As we stated earlier, open-source projects enable students to use source code as a learning tool. They can modify programs

      The free software movement encourages programmers to create many open source projects, including operating systems. These projects are helpful for students because they can read the source code, learn from it, and even change it to practice. Websites like Freshmeat.net and DistroWatch.com are very useful because they collect and share lots of information about open-source software. Freshmeat.net lets people find new software updates, while DistroWatch.com gives details and comparisons of different Linux distributions. Both websites are good resources for learning, exploring, and supporting the open source community.

    6. A large portion of operating system code is dedicated to managing I/O, both because of its importance to the reliability and performance of a system and because of the varying nature of the devices.

      This passage highlights that a significant part of an operating system focuses on I/O management. The diversity of devices and the critical role of I/O in system performance and reliability make it a major responsibility of the OS.

    7. To explain this diversity, we can turn to the history of computers. Although computers have a relatively short history, they have evolved rapidly. Computing started as an experiment to determine what could be done and quickly moved to fixed-purpose systems for military uses, such as code breaking and trajectory plotting, and governmental uses, such as census calculation. Those early computers evolved into general-purpose, multifunction mainframes, and that's when operating systems were born. In the 1960s, Moore's Law predicted that the number of transistors on an integrated circuit would double every 18 months, and that prediction has held true. Computers gained in functionality and shrank in size, leading to a vast number of uses and a vast number and variety of operating systems. (See Appendix A for more details on the history of operating systems.)

      This section traces the evolution of computers to explain the diversity of operating systems. Early fixed-purpose machines for military and government tasks eventually gave way to general-purpose mainframes, prompting the development of operating systems. With rapid advancements in hardware—predicted by Moore’s Law—computers became more powerful and compact, enabling a wide variety of applications and a corresponding variety of OS designs.

    8. Although eBPF provides a rich set of features for tracing within the Linux kernel, it traditionally has been very difficult to develop programs using its C interface. BCC was developed to make it easier to write tools using eBPF by providing a front-end interface in Python. A BCC tool is written in Python and it embeds C code that interfaces with the eBPF instrumentation, which in turn interfaces with the kernel. The BCC tool also compiles the C program into eBPF instructions and inserts it into the kernel using either probes or tracepoints, two techniques that allow tracing events in the Linux kernel.

      I like how the BCC simplifies using the eBPF by letting the developers write their tools in the Python while embedding C for the kernel-level tracing. It’s interesting that the Python acts as a front-end, compiling and also inserting the C code as the eBPF instructions. I’m curious how the probes and the tracepoints differ in practice when monitoring the kernel events.

    9. Debugging the interactions between user-level and kernel code is nearly impossible without a toolset that understands both sets of code and can instrument their interactions. For that toolset to be truly useful, it must be able to debug any area of a system, including areas that were not written with debugging in mind, and do so without affecting system reliability. This toolset must also have a minimal performance impact—ideally it should have no impact when not in use and a proportional impact during use. The BCC toolkit meets these requirements and provides a dynamic, secure, low-impact debugging environment.

      Debugging seems incredibly complex when both user-level and kernel code interact. It's very impressive that the BCC toolkit can dynamically monitor and debug the system with minimal performance impact. I’m curious about how it will ensure the security while accessing the parts of the system that weren’t designed for debugging.

    10. Operating-system debugging and process debugging frequently use different tools and techniques due to the very different nature of these two tasks. Consider that a kernel failure in the file-system code would make it risky for the kernel to try to save its state to a file on the file system before rebooting. A common technique is to save the kernel's memory state to a section of disk set aside for this purpose that contains no file system. If the kernel detects an unrecoverable error, it writes the entire contents of memory, or at least the kernel-owned parts of the system memory, to the disk area. When the system reboots, a process runs to gather the data from that area and write it to a crash dump file within a file system for analysis. Obviously, such strategies would be unnecessary for debugging ordinary user-level processes.

      It’s interesting how the kernel debugging has to plan for the fact that the file system itself might be broken. Writing the memory state to the dedicated disk area outside of the file system is clever—it ensures that crash data isn’t lost even if the system fails completely.

    11. Debugging user-level process code is a challenge. Operating-system kernel debugging is even more complex because of the size and complexity of the kernel, its control of the hardware, and the lack of user-level debugging tools. A failure in the kernel is called a crash. When a crash occurs, error information is saved to a log file, and the memory state is saved to a crash dump.

      It’s overwhelming to see how much harder it is to debug the kernel as compared to the regular programs. The idea of a ‘crash dump’ makes sense—capturing the kernel’s memory state seems to be essential since normal user-level tools can’t reach it.

    1. There are two types of errors; 1. Syntax Error (i.e code not following the rules of the language) 2. Logic Error (i.e the program executes but generates wrong results) <- These types of runtime errors are also called exceptions.

    1. You can see where we're going. If our goal is to minimize copying, it would be better to copy a fundamental type once than to generate a pointer, copy that, then dereference that pointer to get the underlying value. That is the crux of this subtle optimization trick.

      This isn't subtle. It's intuitive and obvious, being presented as if it's non-obvious. I kept waiting for the punchline.

      I guess you can get here if your main mode of thinking is dealing in opaque "best practices" like "use references because it makes your code faster".