10,000 Matching Annotations
  1. Last 7 days
    1. Die Übergabe der Leichen von verstorbenen Geiseln im Gazastreifen könnte laut Angaben des Internationalen Komitees vom Roten Kreuz (IKRK) Tage oder Wochen dauern. Angesichts der Zerstörung von Gebäuden und Infrastruktur im Gazastreifen sei es teils schwierig, die sterblichen Überreste zu lokalisieren, sagte IKRK-Sprecher Christian Cardon in Genf.

      jkfdwsfjofjafg

    1. “Tips for Skimming Books and Articles” explains how skimming can help you obtain a quick sense of what topics are covered.

      Quickly looking over a book or article can help you quickly figure out what are the main ideas included, without having to read everything in detail especially if your kinda in a rush

    1. não corre nenhum prazo de prescrição.
      • Atenção: A CLT é mais protetiva que o Código Civil quanto à prescrição de direitos do menor de 18 anos.

      • No Código Civil, a prescrição somente não corre quanto aos absolutamente incapazes (< 16 anos - art. 198, I, CC)! Na CLT, a prescrição de direito trabalhista somente corre quando a pessoa completa 18 anos.

  2. bafybeie4ygrjpv2sciogqoqdg3o7bon5jj6bay5gzj647gofgwaunne5fa.ipfs.localhost:8080 bafybeie4ygrjpv2sciogqoqdg3o7bon5jj6bay5gzj647gofgwaunne5fa.ipfs.localhost:8080
    1. “Reasoningisbutreckoning,”saidHobbes(1651,ch.V),intheearliestexpressionof

      Reasoning is more like illation

      injecting the mechanical view

      Must read Hobbes

      there is more to Reasoning than what can be captured in discourse and Rational Arguments

    1. implementing DeFi into the NFT Marketplace

      DeFi integrated NFT marketplace opens doors to enhanced liquidity, NFT collateralization, and passive income opportunities. Discover how DeFi protocols can revolutionize digital asset trading and ownership. Discover the key benefits and strategies for implementing this innovation effectively. Read our blog, Build NFT Marketplace Powered by DeFi Protocols, to explore more.

    1. the content will still be presented in an objective style and formal tone

      Even though the writing uses special terms, it’s still written in a fair and professional way. The authors don’t let their personal feelings show, and they keep the language serious and formal

    2. When you search for periodicals, be sure to distinguish among different types.

      Each type is different some are more reliable or detailed than others. Knowing the difference helps me pick the right sources for my research when I know I can get a more accurate answer and understanding

    3. To locate shorter sources, such as magazine and journal articles, you will need to use an online database.

      Database is pretty useful, it organizes the information and makes it easier to look up specific topics, authors, or dates, so I can find what I need for my research more efficiently.

    4. Ask yourself which sources are most likely to provide answers your research questions.

      It’s important you know based on sources what information has been gathered and the truth of that topic

    5. Secondary sources discuss, interpret, analyze, consolidate, or otherwise rework information from primary sources.

      These materials that take information from original sources and then explain or talk about it in more detail. They more so focus on the original content to help people better understand it.

    6. Primary sources are direct, firsthand sources of information or data

      Basically original materials or evidence that come straight from the people or events being studied.

    1. Ze względu na zakres odpowiedzialności ponoszonej przez dłużnika

      rodzaje odpowiedzialności dłużnika 1. osobista - odpowiada całym swoim majątkiem 2. rzeczowa - konieczność znoszenia zaspokojenia z konkretnego przedmiotu majątku

      czasem są obie jednocześnie - np. hipoteka

    2. zobowiązanie naturalne

      istnienie długu, ale bez odpowiedzialności => wierzyciel ma prawo domagać się roszczenia, ale np. jeśli dłużnik powoła się na przedawnienie, to roszczenie traci zaskarżalność => dłużnik może spełnić świadczenie dobrowolnie, ale wierzyciel nie ma ochrony sądowej

    3. Uprawnienia

      podział uprawnień: 1. roszczenia - konkretna osoba uprawniona X może żądać od innego podmiotu spełnienia świadczenia na rzecz X 2. uprawnienia kształtujące - uprawniony X ma kompetencję do zmian/zakończenia stosunku prawnego przez jednostronną czynność prawną 3. zarzuty - uprawnienie do odmowy spełnienia roszczenia a) peremptoryjne (trwałe) - skutek: unicestwienie dochodzenia roszczenia w każdej możliwej chwili (np, przedawnienie) b) dylatoryjne (przejściowe) - skutek: ograniczenie możliwości dochodzenia roszczenia, ale tylko w określonym czasie

    1. eLife Assessment

      The study reports a potential pathway for isoleucine biosynthesis mediated by the underground activity of AHASII, which converts glyoxylate and pyruvate to 2-ketobutyrate. While the findings are valuable in revealing a possible alternative route for isoleucine production, the evidence presented remains incomplete. More comprehensive biochemical experiments are required to substantiate the physiological feasibility of this pathway.

    2. Reviewer #1 (Public review):

      As presented in this short report, the focus is to only establish that acetohydroxyacid synthase II can have underground activity to generate 2-ketobutyrate (from glyoxylate and pyruvate). Additionally, the gene that encodes this protein has an inactivating point mutation in the lab strain of E. coli. In strains lacking the conventional Ile biosynthesis pathway, this enzyme gets reactivated (after short-term laboratory evolution) and putatively can contribute to producing sufficient 2-ketobutyrate, which can feed into Ile production. This is clearly a very interesting observation and finding, and the paper focuses on this single point.

      However, the manuscript as it currently stands is 'minimal', and just barely shows that this reaction/pathway is feasible. There is no characterization of the restored enzyme's activity, rate, or specificity. Additionally, there is no data presented on how much isoleucine can be produced, even at saturating concentrations of glyoxylate or pyruvate. This would greatly benefit from more rigorous characterization of this enzyme's activity and function, as well as better demonstration of how effective this pathway is in generating 2-ketobutyrate (and then its subsequent condensation with pyruvate).

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Rainaldi et al. reports a new sub-pathway for isoleucine biosynthesis by demonstrating the promiscuous activity of the native enzyme acetohydroxyacid synthase II (AHAS II). AHAS-II is primarily known to catalyze the condensation of 2-ketobutyrate (2KB) with pyruvate to form a further downstream intermediate, AHB, in the isoleucine biosynthesis pathway. However, the catalysis of pyruvate and glyoxylate condensation to produce 2KB via the ilvG encoded AHAS II is reported in this manuscript for the first time.

      Using an isoleucine/2KB auxotrophic E. coli strain, the authors report (i) repair of the inactivating frameshift mutation in the ilvG gene, which encodes AHAS-II, supports growth in glyoxylate-supplemented media, (ii) the promiscuity of AHAS-II in glyoxylate and pyruvate condensation, resulting in the formation of isoleucin precursors (2-KB), aiding the biosynthesis of isoleucine, and (iii) comparable efficiency of the recursive AHAS-II route to the canonical routes of isoleucin biosynthesis via computational Flux-based analysis.

      Strengths:

      The authors have used laboratory evolution to uncover a non-canonical metabolic route. The metabolomics and FBA have been used to strengthen the claim.

      Weaknesses:

      While the manuscript proposes an interesting metabolic route for the isoleucine biosynthesis, the data lack key controls, biological replicates, and consistency. The figures and methods are presented inadequately. In the current state, the data fails to support the claims made in the manuscript.

    1. eLife Assessment

      The reviewers have found that this manuscript is a valuable contribution, and the evidence in support of its conclusions is mostly solid. It provides novel insights and raises interesting possibilities about the functions of an understudied histone modification within the nucleosome core; however, the data are mostly descriptive and correlative, and although this has value, it is not totally persuasive. Short of additional non-genomic experiments, a more detailed analysis of the genomic data and perhaps additional data would strengthen the conclusions. The manuscript crucially needs further antibody validation to raise confidence in the data.

    2. Reviewer #1 (Public review):

      Summary:

      The authors investigate the role of H3K115ac in mouse embryonic stem cells. They report that H3K115ac localizes to regions enriched for fragile nucleosomes, CpG islands, and enhancers, and that it correlates with transcriptional activity. These findings suggest a potential role for this globular domain modification in nucleosome dynamics and gene regulation. If robust, these observations would expand our understanding of how non-tail histone modifications contribute to chromatin accessibility and transcriptional control.

      Strengths:

      (1) The study addresses a histone PTM in the globular domain, which is relatively unexplored compared to tail modifications.

      (2) The implication of a histone PTM in fragile nucleosome localization is novel and, if substantiated, could represent a significant advance for the field.

      Weaknesses:

      (1) The absence of replicate paired-end datasets limits confidence in peak localization.

      (2) The analyses are primarily correlative, making it difficult to fully assess robustness or to support strong mechanistic conclusions.

      (3) Some claims (e.g., specificity for CpG islands, "dynamic" regulation during differentiation) are not fully supported by the analyses as presented.

      (4) Overall, the study introduces an intriguing new angle on globular PTMs, but additional rigor and mechanistic evidence are needed to substantiate the conclusions.

    3. Reviewer #2 (Public review):

      Summary:

      Kumar et al. aimed to assess the role of the understudied H3K115 acetylation mark, which is located in the nucleosomal core. To this end, the authors performed ChIP-seq experiments of H3K115ac in mouse embryonic stem cells as well as during differentiation into neuronal progenitor cells. Subsequent bioinformatic analyses revealed an association of H3K115ac with fragile nucleosomes at CpG island promoters, as well as with enhancers and CTCF binding sites. This is an interesting study, which provides important novel insights into the potential function of H3K115ac. However, the study is mainly descriptive, and functional experiments are missing.

      Strengths:

      (1) The authors present the first genome-wide profiling of H3K115ac and link this poorly characterized modification to fragile nucleosomes, CpG island promoters, enhancers, and CTCF binding sites.

      (2) The study provides a valuable descriptive resource and raises intriguing hypotheses about the role of H3K115ac in chromatin regulation.

      (3) The breadth of the bioinformatic analyses adds to the value of the dataset

      Weaknesses:

      (1) I am not fully convinced about the specificity of the antibody. Although the experiment in Figure S1A shows a specific binding to H3K115ac-modified peptides compared to unmodified peptides, the authors do not show any experiment that shows that the antibody does not bind to unrelated proteins. Thus, a Western of a nuclear extract or the chromatin fraction would be critical to show. Also, peptide competition using the H3K115ac peptide to block the antibody may be good to further support the specificity of the antibody. Also, I don't understand the experiment in Figure S1B. What does it tell us when the H3K115ac histone mark itself is missing? The KLF4 promoter does not appear to be a suitable positive control, given that hundreds of proteins/histone modifications are likely present at this region.

      It is important to clearly demonstrate that the antibody exclusively recognizes H3K115ac, given that the conclusion of the manuscript strongly depends on the reliability of the obtained ChIP-Seq data.

      (2) The association of H3K115ac with fragile nucleosomes based on MNase-Sensitivity and fragment length, which are indirect methods and can have technical bias. Experiments that support that the H3K115ac modified nucleosomes are indeed more fragile are missing.

      (3) The comparison of H3K115ac with H3K122ac and H3K64ac relies on publicly available datasets. Since the authors argue that these marks are distinct, data generated under identical experimental conditions would be more convincing. At a minimum, the limitations of using external datasets should be discussed.

      (4) The enrichment of H3K115ac at enhancers and CTCF binding sites is notable but remains descriptive. It would be interesting to clarify whether H3K115ac actively influences transcription factor/CTCF binding or is a downstream correlate.

      (5) No information is provided about how H3K115ac may be deposited/removed. Without this information, it is difficult to place this modification into established chromatin regulatory pathways.

      At the very least, the authors should acknowledge these limitations and provide additional validation of antibody specificity.

    4. Reviewer #3 (Public review):

      Summary:

      Kumar et al. examine the H3K115 epigenetic mark located on the lateral surface of the histone core domain and present evidence that it may serve as a marker enriched at transcription start sites (TSSs) of active CpG island promoters and at polycomb-repressed promoters. They also note enrichment of the H3K115ac mark is found on fragile nucleosomes within nucleosome-depleted regions, on active enhancers, and CTCF-bound sites. They propose that these observations suggest that H3K115ac contributes to nucleosome destabilization and so may serve as a marker of functionally important regulatory elements in mammalian genomes.

      Strengths:

      The authors present novel observations suggesting that acetylation of a histone residue in a core (versus on a histone tail) domain may serve a functional role in promoting transcription, in CPG islands and polycomb-repressed promoters. They present a solid amount of confirmatory in silico data using appropriate methodology that supports the idea that the H3K115ac mark may function to destabilize nucleosomes and contribute to regulating ESC differentiation.

      Weaknesses:

      Additional experiments to confirm antibody specificity are needed. The authors use synthetic peptides for other markers (e.g., H3K122) to support the claim that the antibody is specific, but ChIP-ChIP assays are performed under cross-linked, non-denatured conditions, which preserve structure and epitope accessibility differently than synthetic peptides used for dot blots. Does the antibody give a single band in western blots of histones, and can the H3K115ac peptide block western and immunofluorescence signals of the antibody? Given that the antibody is a rabbit polyclonal, specificity is not a trivial consideration.

    1. eLife Assessment

      This important study establishes bathy phytochromes, a unique class of bacterial photoreceptors that respond to near-infrared light (NIR), as versatile tools for bacterial optogenetics. NIR light is a key control signal in optogenetics due to its deep tissue penetration and the ability to combine with existing red- and blue-light sensitive systems, but thus far, NIR-activated proteins have been poorly characterized. The strength of evidence is convincing, with comprehensive in vitro characterization, modular design strategies, and validation across different hosts, supporting the versatility and potential for these tools in biotechnological applications. This study should advance the fields of optogenetics and photobiology and inspire future work.

    2. Reviewer #1 (Public review):

      Summary:

      This is an interesting study characterizing and engineering so-called bathy phytochromes, i.e. those that respond to near infrared (NIR) light in the ground state, for optogenetic control of bacterial gene expression. Previously, the authors have developed a structure-guided approach to functionally link several light responsive protein domains to the signaling domain of the histidine kinase FixL, which ultimately controls gene expression. Here, the authors use the same strategy to link bathy phytochrome light responsive domains to FixL, resulting in sensors of NIR light. Interestingly, they also link these bathy phytochrome light sensing domains to signaling domains from the tetrathionate-sensing SHK TtrS and the toluene-sensing SHK TodS, demonstrating generality of their protein engineering approach more broadly across bacterial two-component systems.

      This is an exciting result that should inspire future bacterial sensor design. The authors go on to leverage this result to develop what is, to my knowledge, the first system for orthogonally controlling the expression of two separate genes in the same cell with NIR and Red light, a valuable contribution to the field.

      Finally, the authors reveal new details of the pH-dependent photocycle of bathy phytochromes and demonstrate their sensors work in the gut- and plant-relevant strains E. coli Nissle 1917 and A. tumefaciens.

      Strengths:

      The experiments are well founded, well executed, and rigorous.

      The manuscript is clearly written.

      The sensors developed exhibit large responses to light, making them valuable tools for ontogenetic applications.

      This study is a valuable contribution to photobiology and optogenetics.

      Weaknesses:

      As the authors note, the sensors are relatively insensitive to NIR light due to the rapid dark reversion process in bathy phytochromes. Though NIR light is generally non-phototoxic, one would expect this characteristic to be a limitation in some downstream applications where light intensities are not high (e.g. in vivo).

      Though they can be multiplexed with Red light sensors, these bathy phytochrome NIR sensors are more difficult to multiplex with other commonly used light sensors (e.g. blue) due to the broad light responsivity of the Pfr state. This challenge may be overcome by careful dosing of blue light, as the authors discuss, but other bacterial NIR sensing systems with less cross-talk may be preferred in some applications.

      Comments on revisions:

      My concerns have been addressed.

    3. Reviewer #2 (Public review):

      In this manuscript, Meier et al. engineer a new class of light-regulated two-component systems. These systems are built using bathy-bacteriophytochromes that respond to near-infrared (NIR) light. Through a combination of genetic engineering and systematic linker optimization, the authors generate bacterial strains capable of selective and tunable gene expression in response to NIR stimulation. Overall, these results are an interesting expansion of the optogenetic toolkit into the NIR range. The cross-species functionality of the system, modularity, and orthogonality have the potential to make these tools useful for a range of applications.

      Strengths:

      (1) The authors introduce a novel class of near-infrared light-responsive two-component systems in bacteria, expanding the optogenetic toolbox into this spectral range.

      (2) Through engineering and linker optimization, the authors achieve specific and tunable gene expression, with minimal cross-activation from red light in some cases.

      (3) The authors show that the engineered systems function robustly in multiple bacterial strains, including laboratory E. coli, the probiotic E. coli Nissle 1917, and Agrobacterium tumefaciens.

      (4) The combination of orthogonal two-component systems can allow for simultaneous and independent control of multiple gene expression pathways using different wavelengths of light.

      (5) The authors explore the photophysical properties of the photosensors, investigating how environmental factors such as pH influence light sensitivity.

      Comments on revisions:

      The authors have addressed all my prior concerns.

    4. Reviewer #3 (Public review):

      Summary:

      This paper by Meier et al introduces a new optogenetic module for regulation of bacterial gene expression based on "bathy-BphP" proteins. Their paper begins with a careful characterization of kinetics and pH dependence of a few family members, followed by extensive engineering to produce infrared-regulated transcriptional systems based on the authors' previous design of the pDusk and pDERusk systems, and closing with characterization of the systems in bacterial species relevant for biotechnology.

      Strengths:

      The paper is important from the perspective of fundamental protein characterization, since bathy-BphPs are relatively poorly characterized compared to their phytochrome and cyanobacteriochrome cousins. It is also important from a technology development perspective: the optogenetic toolbox currently lacks infrared-stimulated transcriptional systems. Infrared light offers two major advantages: it can be multiplexed with additional tools, and it can penetrate into deep tissues with ease relative to the more widely used blue light activated systems. The experiments are performed carefully and the manuscript is well written.

      Weaknesses:

      Some of the light-inducible responses described in this compelling paper are complex and difficult to rationalize, such as the dependence of light responses on linker length and differences in responses observed from the bathy-BphPs in isolation versus strains in which they are multiplexed. Nevertheless, the authors should be commended for carrying out rigorous experiments and reporting these results accurately. These are minor weaknesses in an overall very strong paper.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This is an interesting study characterizing and engineering so-called bathy phytochromes, i.e., those that respond to near infrared (NIR) light in the ground state, for optogenetic control of bacterial gene expression. Previously, the authors have developed a structure-guided approach to functionally link several light-responsive protein domains to the signaling domain of the histidine kinase FixL, which ultimately controls gene expression. Here, the authors use the same strategy to link bathy phytochrome light-responsive domains to FixL, resulting in sensors of NIR light. Interestingly, they also link these bathy phytochrome light-sensing domains to signaling domains from the tetrathionate-sensing SHK TtrS and the toluene-sensing SHK TodS, demonstrating the generality of their protein engineering approach more broadly across bacterial two-component systems.

      This is an exciting result that should inspire future bacterial sensor design. They go on to leverage this result to develop what is, to my knowledge, the first system for orthogonally controlling the expression of two separate genes in the same cell with NIR and Red light, a valuable contribution to the field.

      Finally, the authors reveal new details of the pH-dependent photocycle of bathy phytochromes and demonstrate that their sensors work in the gut - and plant-relevant strains E. coli Nissle 1917 and A. tumefaciens.

      Strengths:

      (1) The experiments are well-founded, well-executed, and rigorous.

      (2) The manuscript is clearly written.

      (3) The sensors developed exhibit large responses to light, making them valuable tools for ontogenetic applications.

      (4) This study is a valuable contribution to photobiology and optogenetics.

      We thank the reviewer for the positive verdict on our manuscript.

      Weaknesses:

      (1) As the authors note, the sensors are relatively insensitive to NIR light due to the rapid dark reversion process in bathy phytochromes. Though NIR light is generally non-phototoxic, one would expect this characteristic to be a limitation in some downstream applications where light intensities are not high (e.g., in vivo).

      We principally concur with this reviewer’s assessment that delivery of light (of any color) into living tissue can be severely limited by absorption, reflection, and scattering. That notwithstanding, at least two considerations suggest that in-vivo deployment of the pNIRusk setups we presently advance may be feasible.

      First, while the pNIRusk setups are indeed less light-sensitive compared to, e.g., our earlier redlight-responsive pREDusk and pDERusk setups (see Meier et al. Nat Commun 2024), we note that the overall light fluences required for triggering them are in the range of tens of µW per cm<sub>2</sub>. By contrast, optogenetic experiments in vivo, in particular in the neurosciences, often employ light area intensities on the order of mW per cm<sub>2</sub> and above. Put another way, compared to the optogenetic tools used in these experiments, the pNIRusk setups are actually quite sensitive to light.

      Second, sensitivity to NIR light brings the advantage of superior tissue penetration, see data reported by Weissleder Nat Biotech 2001 and Ash et al. Lasers Med Sci 2017 (both papers are cited in our manuscript). Based on these data, the intensity of blue light (450 nm) therefore falls off 5-10 times more strongly with penetration depth than that of NIR light (800 nm).

      We have added a brief treatment of these aspects in the Discussion section.

      (2) Though they can be multiplexed with Red light sensors, these bathy phytochrome NIR sensors are more difficult to multiplex with other commonly used light sensors (e.g., blue) due to the broad light responsivity of the Pfr state. This challenge may be overcome by careful dosing of blue light, as the authors discuss, but other bacterial NIR sensing systems with less cross-talk may be preferred in some applications.

      The reviewer is correct in noting that, at least to a certain extent, the pNIRusk systems also respond to blue light owing to their Soret absorbance bands (see Fig. 1). That said, we note two points:

      First, a given photoreceptor that preferentially responds to certain wavelengths, e.g., 700 nm in the case of conventional bacterial phytochromes (BphP), generally absorbs shorter wavelengths to some degree as well. Absorption of these shorter wavelengths suffices for driving electronic and/or vibronic transitions of the chromophore to higher energy levels which often give rise to productive photochemistry and downstream signal transduction. Put another way, a certain response of sensory photoreceptors to shorter wavelengths is hence fully expected and indeed experimentally borne out, as for instance shown by Ochoa-Fernandez et al. in the so-called PULSE setup (Nat Meth 2020, doi: 10.1038/s41592-020-0868-y).

      Second, known BphPs share similar Pr and Pfr absorbance spectra. We therefore expect other BphP-based optogenetic setups to also respond to blue light to some degree. Currently, there are insufficient data to gauge whether individual BphPs systematically differ in their relative sensitivity to blue compared to red or NIR light. Arguably, pertinent experiments may be an interesting subject for future study.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Meier et al. engineer a new class of light-regulated two-component systems. These systems are built using bathy-bacteriophytochromes that respond to near-infrared (NIR) light. Through a combination of genetic engineering and systematic linker optimization, the authors generate bacterial strains capable of selective and tunable gene expression in response to NIR stimulation. Overall, these results are an interesting expansion of the optogenetic toolkit into the NIR range. The cross-species functionality of the system, modularity, and orthogonality have the potential to make these tools useful for a range of applications.

      Strengths:

      (1) The authors introduce a novel class of near-infrared light-responsive two-component systems in bacteria, expanding the optogenetic toolbox into this spectral range.

      (2) Through engineering and linker optimization, the authors achieve specific and tunable gene expression, with minimal cross-activation from red light in some cases.

      (3) The authors show that the engineered systems function robustly in multiple bacterial strains, including laboratory E. coli, the probiotic E. coli Nissle 1917, and Agrobacterium tumefaciens.

      (4) The combination of orthogonal two-component systems can allow for simultaneous and independent control of multiple gene expression pathways using different wavelengths of light.

      (5) The authors explore the photophysical properties of the photosensors, investigating how environmental factors such as pH influence light sensitivity.

      Weaknesses:

      (1) The expression of multi-gene operons and fluorescent reporters could impose a metabolic burden. The authors should present data comparing optical density for growth curves of engineered strains versus the corresponding empty-vector control to provide insight into the burden and overall impact of the system on host viability and growth.

      In response to this comment, we have recorded growth kinetics of bacteria harboring the pNIRusk-DsRed plasmids or empty vectors under both inducing (i.e., under NIR light) and noninducing conditions (i.e., darkness). We did not observe systematic differences in the growth kinetics between the different cultures, thus suggesting that under the conditions tested there is no adverse effect on cell viability.

      We include the new data in Suppl. Fig. 5c-d and refer to them in the main text.

      (2) The manuscript consistently presents normalized fluorescence values, but the method of normalization is not clear (Figure 2 caption describes normalizing to the maximal fluorescence, but the maximum fluorescence of what?). The authors should provide a more detailed explanation of how the raw fluorescence data were processed. In addition, or potentially in exchange for the current presentation, the authors should include the raw fluorescence values in supplementary materials to help readers assess the actual magnitude of the reported responses.

      We appreciate this valid comment and have altered the representation of the fluorescence data. All values for a given fluorescent protein (i.e., either DsRed or YPet) across all systems are now normalized to a single reference value, thus enabling direct comparison between experiments.

      (3) Related to the prior point, it would be useful to have a positive control for fluorescence that could be used to compare results across different figure panels.

      As all data are now normalized to the same reference value, direct comparison across all figures is enabled.

      (4) Real-time gene expression data are not presented in the current manuscript, but it would be helpful to include a time-course for some of the key designs to help readers assess the speed of response to NIR light.

      In response to this comment, we include in the revised manuscript induction kinetics of bacterial cultures bearing pNIRusk upon transfer to inducing NIR-light conditions. To this end, aliquots were taken at discrete timepoints, transcriptionally and translationally arrested, and analyzed for optical density and DsRed reporter fluorescence after allowing for chromophore maturation.

      We include the new data in Suppl. Fig. 5e and refer to them in the manuscript.

      Moreover, we note that the experiments in Agrobacterium tumefaciens used a luciferase reporter thus enabling the continuous monitoring of the light-induced expression kinetics. These data (unchanged in revision) are to be found in Suppl. Fig. 9.

      Reviewer #3 (Public review):

      Summary:

      This paper by Meier et al introduces a new optogenetic module for the regulation of bacterial gene expression based on "bathy-BphP" proteins. Their paper begins with a careful characterization of kinetics and pH dependence of a few family members, followed by extensive engineering to produce infrared-regulated transcriptional systems based on the authors' previous design of the pDusk and pDERusk systems, and closing with characterization of the systems in bacterial species relevant for biotechnology.

      Strengths:

      The paper is important from the perspective of fundamental protein characterization, since bathyBphPs are relatively poorly characterized compared to their phytochrome and cyanobacteriochrome cousins. It is also important from a technology development perspective: the optogenetic toolbox currently lacks infrared-stimulated transcriptional systems. Infrared light offers two major advantages: it can be multiplexed with additional tools, and it can penetrate into deep tissues with ease relative to the more widely used blue light-activated systems. The experiments are performed carefully, and the manuscript is well written.

      Weaknesses:

      My major criticism is that some information is difficult to obtain, and some data is presented with limited interpretation, making it difficult to obtain intuition for why certain responses are observed. For example, the changes in red/infrared responses across different figures and cellular contexts are reported but not rationalized. Extensive experiments with variable linker sequences were performed, but the rationale for linker choices was not clearly explained. These are minor weaknesses in an overall very strong paper.

      We are grateful for the positive take on our manuscript.

      Reviewer #1 (Recommendations for the authors):

      (1) As eLife is a broad audience journal, please define the Soret and Q-bands (line 125).

      We concur and have added labels in fig. 1a that designate the Soret and Q bands.

      (2) The initial (0) Ac design in Figure 2b is activated by NIR and Red light, albeit modestly. The authors state that this construct shows "constant reporter fluorescence, largely independent of illumination" (line 167). This language should be changed to reflect the fact that this Ac construct responds to both of these wavelengths.

      Agreed. We have amended the text accordingly.

      (3) pNIRusk Ac 0 appears to show a greater light response than pNIRusk Av -5. However, the authors claim that the former is not light-responsive and the latter is. This conclusion should be explained or changed.

      The assignment of pNIRusk Av-5 as light-responsive is based on the relative difference in reporter fluorescence between darkness and illumination with either red or NIR light. Although the overall fluorescence is much lower in Av-5 than for Av-0, the relative change upon illumination is much more pronounced. We add a statement to this effect to the text.

      (4) The authors state that "when combining DmDERusk-Str-YPet with AvTod+21-DsRed expression rose under red and NIR light, respectively, whereas the joint application of both light colors induced both reporter genes" (lines 258-261). In contrast, Figure 3c shows that application of both wavelengths of light results in exclusive activation of YPet expression. It appears the description of the data is wrong and must be corrected. That said, this error does not impact their conclusion that two separate target genes can be independently activated by NIR and red light.

      We thank the reviewer for catching this error which we have corrected in the revised manuscript.

      (5) Line 278: I don't agree with the authors' blanket statement that the use of upconversion nanoparticles is a "grave" limitation for NIR-light mediated activation of bacterial gene expression in vivo. The authors should either expound on the severity of the limitation or use more moderate language.

      We have replaced the word ‘grave’ by ‘potential’ and thereby toned down our wording.

      Reviewer #2 (Recommendations for the authors):

      (1) Please include a discussion on the expected depth penetration of different light wavelengths. This is most relevant in the context of the discussion about how these NIR systems could be used with living therapeutics.

      Given the heterogeneity of biological tissue, it is challenging to state precise penetration depths for different wavelengths of light. That said, blue light for instance is typically attenuated by biological tissue around 5 to 10 times as strongly as near-infrared light is.

      We have expanded the Discussion chapter to cover these aspects.

      (2) It would be helpful for Figure 2C (or supplementary) to also include the response to blue light stimulation.

      We agree and have acquired pertinent data for the blue-light response. The new data are included in an updated Fig. 2c. Data acquired at varying NIR-light intensities, originally included in Fig. 2c, have been moved to Suppl. Fig. 5a-b.

      (3) In Figure 4A, data on the response of E. coli Nissle to blue and red light are missing. Including this would help identify whether the reduced sensitivity to non-NIR wavelengths observed in the E. coli lab strain is preserved in the probiotic background.

      In response to this comment, we have acquired pertinent data on E. coli Nissle. While the results were overall similar to those in the laboratory strain, the response to blue and NIR light was yet lower in the Nissle bacteria which stands to benefit optogenetic applications.

      We have updated Fig. 4a accordingly. For clarity, we only show the data for AvNIRusk in the main paper but have relegated the data on AcNIRusk to Suppl. Fig. 8. (Note that this has necessitated a renumbering of the subsequent Suppl. Figs.)

      (4) On many of the figures, there are thin gray lines that appear between the panels that it would be nice to eliminate because, in some cases, they cut through words and numbers.

      The grey lines likely arose from embedding the figures into the text document. In the typeset manuscript, which has become available on the eLife webpage in the meantime, there are no such lines. That said, we will carefully check throughout the submission/publishing/proofing process lest these lines reappear.

      (5) Page 7, line 155: "As not least seen" typo or awkward phrasing.

      We have restructured the sentence and thereby hopefully clarified the unclear phrasing.

      (6) Page 7, line 167: It does not appear to be the case that the initial pNIRusk designs show constant fluorescence that is largely independent of illumination. AcNIRusk shows an almost twofold change from dark to NIR. Reword this to avoid confusion.

      We concur with this comment, similar to reviewer #1’s remark, and have adjusted the text accordingly.

      (7) Page 8, line 174: Related to the previous point, AvNIRusk has one design that is very minimally light switchable (-5), so stating that six light switchable designs have been identified is also confusing.

      As stated in our response to reviewer #1 above, the assignment of AvNIRusk-5 as light-switchable is based on the relative fluorescence change upon illumination. We have added an explanation to the text.

      (8) Page 10, line 228-229: I was not able to find the data showing that expression levels were higher for the DmTtr systems than the pREDusk and pNIRusk setups. This may be an issue related to the normalization point. It was not clear to me how to compare these values.

      We apologize for the initially unclear representation of the data. In response to this reviewer’s general comments above, we have now normalized all fluorescence values to a single reference value, thus allowing their direct comparison.

      (9) Page 12, line 264: "finer-grained expression control can be exerted..." Either show data or adjust the language so that it is clear this is a prediction.

      True, we have replaced ‘can’ by ‘could’.

      (10) Page 25, line 590: CmpX13 cells have a reference that is given later, but it should be added where it first appears.

      Agreed, we have added the reference in the indicated place.

      (11) Page 25, line 592: define LB/Kan.

      We had already defined this abbreviation further up but, for clarity, we have added it again in the indicated position.

      (12) Page 40, line 946: "normalized by" rather than "to".

      We have implemented the requested change in the indicated and several other positions of the manuscript.

      (13) Figures 2C, 3C, and similar plots in the supplementary material would benefit from having a legend for the colors.

      We agree and have added pertinent legends to the corresponding main and supplementary figures.

      (14) As a reader, I had some trouble following all the acronyms. This is at the author's discretion, but I would eliminate ones that are not strictly essential (e.g. MTP for microtiter plate; I was unable to identify what "MCS" meant; look for other opportunities to remove acronyms).

      In the revised manuscript, we have defined the abbreviation ‘MCS’ (for ‘multiple-cloning site’) upon first occurrence. We have decided to retain the abbreviation ‘MTP’ in the text.

      (15) Could the authors briefly speculate on why A. tumefaciens activation with red light might occur?

      While we can but speculate as to the underlying reasons for the divergent red-light response in A. tumefaciens, we discuss possible scenarios below.

      Commonly, two-component systems (TCS) exhibit highly cooperative and steep responses to signal. As a consequence, even small differences in the intracellular amounts of phosphorylated and unphosphorylated response regulator (RR) can give to significantly changed gene-expression output. Put another way, the gene-expression output need not scale linearly with the extent of RR phosphorylation but, rather, is expected to show nonlinear dependence with pronounced thresholding effects.

      Differences in the pertinent RR levels can for instance arise from variations in the expression levels of the pNIRusk system components between E. coli and A. tumefaciens. Moreover, the two bacteria greatly differ in their two-component-system (TCS) repertoire. Although TCSs are commonly well insulated from each other, cross-talk with endogenous TCSs, even if limited, may cause changes in the levels of phosphorylated RR and hence gene-expression output. In a similar vein, the RR can also be phosphorylated and dephosphorylated non-enzymatically, e.g., by reaction with high-energy anhydrides (such as acetyl phosphate) and hydrolysis, respectively. Other potential origins for the divergent red-light response include differences in the strength of the promoters driving expression of the pNIRusk system components and the fluorescent/luminescent reporters, respectively.

      (16) It would be helpful for the authors to briefly explain why they needed to switch to luminescence from fluorescence for the A. tumeraciens studies.

      While there was no strict necessity to switch from the fluorescence-based system used in E. coli to a luminescence-based system in A. tumefaciens, we opted for luminescence based on prior experience with other Alphaproteobacteria (e.g., 10.1128/mSystems.00893-21), where luminescence offered significant advantages. Specifically, it provides essentially background-free signal detection and greater sensitivity for monitoring gene expression. In addition, as demonstrated in Suppl. Fig. 9c and d, the luminescence system enables real-time tracking of gene expression dynamics, which further supported its use in our experimental setup (see our response to reviewer #2’s general comments).

      (17) This is a very minor comment that the authors can take or leave, but I got hung up on the word "implement" when it appeared a few times in the manuscript because I tended to read it as "put a plan into place" rather than its other meaning.

      In the abstract, we have replaced one instance of the word ‘implement’ by ‘instrument’.

      (18) The authors should include the relevant constructs on AddGene or another public strainsharing service.

      We whole-heartedly subscribe to the idea of freely sharing research materials with fellow scientists. Therefore, we had already deposited the most relevant AvNIRusk in Addgene, even prior to the initial submission of the manuscript (accession number 235084). In the meantime, we have released the deposition, and the plasmid can be obtained from Addgene since May 15<sub>th</sub> of this year.

      Reviewer #3 (Recommendations for the authors):

      Suggestion for improvement:

      This paper relies heavily on variations in linker sequences to shift responses. I am familiar with prior work from the Moglich lab in which helical linkers were employed to shift responses in synthetic two-component systems, with interesting periodicity in responses with every 7 residues (as expected for an alpha helix) and inversion of responses at smaller linker shifts. There is no mention in this paper whether their current engineering follows a similar rationale, what types of linkers are employed (e.g. flexible vs helical), and whether there is an interpretation for how linker lengths alter responses. Can you explain what classes of linker sequences are used throughout Figures 2 and 3, and whether length or periodicity affects the outcome? This would be very helpful for readers who are new to this approach, or if the rationale here differs from the authors' prior work.

      The PATCHY approach employed at present followed a closely similar rationale as in our previous studies. That is, linkers were extended/shortened and varied in their sequence by recombining different fragments of the natural linkers of the parental receptors, i.e., the bacteriophytochrome and the FixL sensor histidine kinase, respectively. We have added a statement to this effect in the text and a reference to Suppl. Fig. 3 which illustrates the principal approach.

      Compared to our earlier studies, we isolated fewer receptor variants supporting light-regulated responses, despite covering a larger sequence space. Owing to the sparsity of the light-regulated variants, an interpretation of the linker properties and their correlation with light-regulated activity is challenging. Although doubtless unsatisfying from a mechanistic viewpoint, we therefore refrain from a pertinent discussion which would be premature and speculative at this point. As the reviewer raises a valid and important point, we have expanded the text by referring to our earlier studies and the observed dependence of functional properties on linker composition.

      It is sometimes difficult to intuit or rationalize the differences in red/IR sensitivity across closely related variants. An important example appears in Figure 3C vs 3B. I think the AvTod+21 in 3B should be the equivalent to the DsRed response in the second column of 3C (AvTod+21 + DmDERusk), except, of course, that the bacteria in 3C carry an additional plasmid for the DERusk system. However, in 3B, the response to red light is substantial - ~50% as strong as that for IR, whereas in 3C, red light elicits no response at all. What is the difference? The reason this is important is that the AvTod+21 and DMDERusk represent the best "orthogonal" red and infrared light responses, but this is not at all obvious from 3B, where AvTod+21 still causes a substantial (and for orthogonality, undesirable) response under red light. Perhaps subtle differences in expression level due to plasmid changes cause these differences in light responses? Could the authors test how the expression level affects these responses? The paper would be greatly improved if observations of the diverse red/IR responses could be rationalized by some design criteria.

      As noted above in our response to reviewer #2, we have now normalized all fluorescence readings to joint reference values, thus allowing a better comparison across experiments.

      The reviewer is correct in noting that upon multiplexing, the individual plasmid systems support lower fluorescence levels than when used in isolation. We speculate that the combination of two plasmids may affect their copy numbers (despite the use of different resistance markers and origins of replications) and hence their performance. Likewise, the cellular metabolism may be affected when multiple plasmids are combined. These aspects may well account for the absent red-light response in AvTod+21 in the multiplexing experiments which is – indeed – unexpected. As, at present, we cannot provide a clear rationalization for this effect, we recommend verifying the performance of the plasmid setups when multiplexing.

      The paper uses "red" and "infrared" to refer to ~624 nm and ~800 nm light, respectively. I wonder whether it might be possible to shift these peak wavelengths to obtain even better separation for the multiplexing experiments. Perhaps shifting the specific red wavelength could result in better separation between DERusk and AvTod systems, for example? Could the authors comment on this (maybe based on action spectra of their previously developed tools) or perhaps test a few additional stimulation wavelengths?

      The choice of illumination wavelengths used in these experiments is dictated by the LED setups available for illumination of microtiter plates. On the one hand, we are using an SMD (surface-mount device) three-color LED with a fixed wavelength of the red channel around 624 nm (see Hennemann et al., 2018). On the other hand, we are deploying a custom-built device with LEDs emitting at around 800 nm (see Stüven et al., 2019 and this work). Adjusting these wavelengths is therefore challenging, although without doubt potentially interesting.

      To address this reviewer comment, we have added a statement to the text that the excitation wavelengths may be varied to improve multiplexed applications.

      Additional minor comments:

      (1) Figure 2C: It would be very helpful to place a legend on the figure panel for what the colors indicate, since they are unique to this panel and non-intuitive.

      This comment coincides with one by reviewer #2, and we have added pertinent legends to this and related supplementary figures.

      (2) Figure 3C: it is not obvious which system uses DsRed and which uses YPet in each combination, since the text indicates that all combinations were cloned, and this is not clearly described in the legend. Is it always the first construct in the figure legend listed for DsRed and the second for YPet?

      For clarification, we have revised the x-axis labels in Fig. 3C. (And yes, it is as this reviewer surmises: the first of the two constructs harbored DsRed and the second one YPet.)

    1. eLife Assessment

      This global study compares environmental niche model outputs of avian influenza pathogen niche constructed for two distinct periods, and uses differences between those outputs to suggest that the changed case numbers and distribution relate to intensification of chicken and duck farming, and extensive cultivation. While a useful update to existing niche models of highly pathogenic avian influenza, the justification for the use of environmental niche models to explore correlative relationships between land cover change and changed case epidemiology is incomplete. Key assumptions have not been adequately clarified for the readers benefit, and in consequence the communication of the likely limitations of the work are not sufficiently clear.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aim to predict ecological suitability for transmission of highly pathogenic avian influenza (HPAI) using ecological niche models. This class of models identify correlations between the locations of species or disease detections and the environment. These correlations are then used to predict habitat suitability (in this work, ecological suitability for disease transmission) in locations where surveillance of the species or disease has not been conducted. The authors fit separate models for HPAI detections in wild birds and farmed birds, for two strains of HPAI (H5N1 and H5Nx) and for two time periods, pre- and post-2020. The authors also validate models fitted to disease occurrence data from pre-2020 using post-2020 occurrence data. I thank the authors for taking the time to respond to my initial review and I provide some follow-up below.

      Detailed comments:

      In my review, I asked the authors to clarify the meaning of "spillover" within the HPAI transmission cycle. This term is still not entirely clear: at lines 409-410, the authors use the term with reference to transmission between wild birds and farmed birds, as distinct to transmission between farmed birds. It is implied but not explicitly stated that "spillover" is relevant to the transmission cycle in farmed birds only. The sentence, "we developed separate ecological niche models for wild and domestic bird HPAI occurrences ..." could have been supported by a clear sentence describing the transmission cycle, to prime the reader for why two separate models were necessary.

      I also queried the importance of (dead-end) mammalian infections to a model of the HPAI transmission risk, to which the authors responded: "While spillover events of HPAI into mammals have been documented, these detections are generally considered dead-end infections and do not currently represent sustained transmission chains. As such, they fall outside the scope of our study, which focuses on avian hosts and models ecological suitability for outbreaks in wild and domestic birds." I would argue that any infections, whether they are in dead-end or competent hosts, represent the presence of environmental conditions to support transmission so are certainly relevant to a niche model and therefore within scope. It is certainly understandable if the authors have not been able to access data of mammalian infections, but it is an oversight to dismiss these infections as irrelevant.

      Correlative ecological niche models, including BRTs, learn relationships between occurrence data and covariate data to make predictions, irrespective of correlations between covariates. I am not convinced that the authors can make any "interpretation" (line 298) that the covariates that are most informative to their models have any "influence" (line 282) on their response variable. Indeed, the observation that "land-use and climatic predictors do not play an important role in the niche ecological models" (line 286), while "intensive chicken population density emerges as a significant predictor" (line 282) begs the question: from an operational perspective, is the best (e.g., most interpretable and quickest to generate) model of HPAI risk a map of poultry farming intensity?

      I have more significant concerns about the authors' treatment of sampling bias: "We agree with the Reviewer's comment that poultry density could have potentially been considered to guide the sampling effort of the pseudo-absences to consider when training domestic bird models. We however prefer to keep using a human population density layer as a proxy for surveillance bias to define the relative probability to sample pseudo-absence points in the different pixels of the background area considered when training our ecological niche models. Indeed, given that poultry density is precisely one of the predictors that we aim to test, considering this environmental layer for defining the relative probability to sample pseudo-absences would introduce a certain level of circularity in our analytical procedure, e.g. by artificially increasing to influence of that particular variable in our models." The authors have elected to ignore a fundamental feature of distribution modelling with occurrence-only data: if we include a source of sampling bias as a covariate and do not include it when we sample background data, then that covariate would appear to be correlated with presence. They acknowledge this later in their response to my review: "...assuming a sampling bias correlated with poultry density would result in reducing its effect as a risk factor." In other words, the apparent predictive capacity of poultry density is a function of how the authors have constructed the sampling bias for their models. A reader of the manuscript can reasonably ask the question: to what degree are is the model a model of HPAI transmission risk, and to what degree is the model a model of the observation process? The sentence at lines 474-477 is a helpful addition, however the preceding sentence, "Another approach to sampling pseudo-absences would have been to distribute them according to the density of domestic poultry," (line 474) is included without acknowledgement of the flow-on consequence to one of the key findings of the manuscript, that "...intensive chicken population density emerges as a significant predictor..." (line 282). The additional context on the EMPRES-i dataset at line 475-476 ("the locations of outbreaks ... are often georeferenced using place name nomenclatures") is in conflict with the description of the dataset at line 407 ("precise location coordinates"). Ultimately, the choices that the authors have made are entirely defensible through a clear, concise description of model features and assumptions, and precise language to guide the reader through interpretation of results. I am not satisfied that this is provided in the revised manuscript.

      The authors have slightly misunderstood my comment on "extrapolation": I referred to "environmental extrapolation" in my review without being particularly explicit about my meaning. By "environmental extrapolation", I meant to ask whether the models were predicting to environments that are outside the extent of environments included in the occurrence data used in the manuscript. The authors appear to have understood this to be a comment on geographic extrapolation, or predicting to areas outside the geographic extent included in occurrence data, e.g.: "For H5Nx post-2020, areas of high predicted ecological suitability, such as Brazil, Bolivia, the Caribbean islands, and Jilin province in China, likely result from extrapolations, as these regions reported few or no outbreaks in the training data" (lines 195-197). Is the model extrapolating in environmental space in these regions? This is unclear. I do not suggest that the authors should carry out further analysis, but the multivariate environmental similarly surface (MESS; see Elith et al., 2010: https://doi.org/10.1111/j.2041-210X.2010.00036.x ) is a useful tool to visualise environmental extrapolation and aid model interpretation.

      On the subject of "extrapolation", I am also concerned by the additions at lines 362-370: "...our models extrapolate environmental suitability for H5Nx in wild birds in areas where few or no outbreaks have been reported. This discrepancy may be explained by limited surveillance or underreporting in those regions." The "discrepancy" cited here is a feature of the input dataset, a function of the observation distribution that should be captured in pseudo-absence data. The authors state that Kazakhstan and Central Asia are areas of interest, and that the environments in this region are outside the extent of environments captured in the occurrence dataset, although it is unclear whether "extrapolation" is informed by a quantitative tool like a MESS or judged by some other qualitative test. The authors then cite Australia as an example of a region with some predicted suitability but no HPAI outbreaks to date, however this discussion point is not linked to the idea that the presence of environmental conditions to support transmission need not imply the occurrence of transmission (as in the addition, "...spatial isolation may imply a lower risk of actual occurrences..." at line 214). Ultimately, the authors have not added any clear comment on model uncertainty (e.g., variation between replicated BRTs) as I suggested might be helpful to support their description of model predictions.

      All of my criticisms are, of course, applied with the understanding that niche modelling is imperfect for a disease like HPAI, and that data may be biased/incomplete, etc.: these caveats are common across the niche modelling literature. However, if language around the transmission cycle, the niche, and the interpretation of any of the models is imprecise, which I find it to be in the revised manuscript, it undermines all of the science that is presented in this work.

    3. Reviewer #2 (Public review):

      Summary:

      The geographic range of highly pathogenic avian influenza cases changed substantially around the period 2020, and there is much interest in understanding why. Since 2020 the pathogen irrupted in the Americas and the distribution in Asia changed dramatically. This study aimed to determine which spatial factors (environmental, agronomic and socio-economic) explain the change in numbers and locations of cases reported since 2020 (2020--2023). That's a causal question which they address by applying correlative environmental niche modelling (ENM) approach to the avian influenza case data before (2015--2020) and after 2020 (2020--2023) and separately for confirmed cases in wild and domestic birds. To address their questions they compare the outputs of the respective models, and those of the first global model of the HPAI niche published by Dhingra et al 2016.

      ENM is a correlative approach useful for extrapolating understandings based on sparse geographically referenced observational data over un- or under-sampled areas with similar environmental characteristics in the form of a continuous map. In this case, because the selected covariates about land cover, use, population and environment are broadly available over the entire world, modelled associations between the response and those covariates can be projected (predicted) back to space in the form of a continuous map of the HPAI niche for the entire world.

      Strengths:

      The authors are clear about expected bias in the detection of cases, such geographic variation in surveillance effort (testing of symptomatic or dead wildlife, testing domestic flocks) and in general more detections near areas of higher human population density (because if a tree falls in a forest and there is no-one there, etc), and take steps to ameliorate those. The authors use boosted regression trees to implement the ENM, which typically feature among the best performing models for this application (also known as habitat suitability models). They ran replicate sets of the analysis for each of their model targets (wild/domestic x pathogen variant), which can help produce stable predictions. Their code and data is provided, though I did not verify that the work was reproducible.

      The paper can be read as a partial update to the first global model of H5Nx transmission by Dhingra and others published in 2016 and explicitly follows many methodological elements. Because they use the same covariate sets as used by Dhingra et al 2016 (including the comparisons of the performance of the sets in spatial cross-validation) and for both time periods of interest in the current work, comparison of model outputs is possible. The authors further facilitate those comparisons with clear graphics and supplementary analyses and presentation. The models can also be explored interactively at a weblink provided in text, though it would be good to see the model training data there too.

      The authors' comparison of ENM model outputs generated from the distinct HPAI case datasets is interesting and worthwhile, though for me, only as a response to differently framed research questions.

      Weaknesses:

      This well-presented and technically well-executed paper has one major weakness to my mind. I don't believe that ENM models were an appropriate tool to address their stated goal, which was to identify the factors that "explain" changing HPAI epidemiology.

      Here is how I understand and unpack that weakness:

      (1) Because of their fundamentally correlative nature, ENMs are not a strong candidate for exploring or inferring causal relationships.

      (2) Generating ENMs for a species whose distribution is undergoing broad scale range change is complicated and requires particular caution and nuance in interpretation (e.g., Elith et al, 2010, an important general assumption of environmental niche models is that the target species is at some kind of distributional equilibrium (at time scales relevant to the model application). In practice that means the species has had an opportunity to reach all suitable habitats and therefore its absence from some can be interpreted as either unfavourable environment or interactions with other species). Here data sets for the response (N5H1 or N5Hx case data in domestic or wild birds ) were divided into two periods; 2015--2020, and 2020--2023 based on the rationale that the geographic locations and host-species profile of cases detected in the latter period was suggestive of changed epidemiology. In comparing outputs from multiple ENMs for the same target from distinct time periods the authors are expertly working in, or even dancing around, what is a known grey area, and they need to make the necessary assumptions and caveats obvious to readers.

      (3) To generate global prediction maps via ENM, only variables that exist at appropriate resolution over the desired area can be supplied as covariates. What processes could influence changing epidemiology of a pathogen and are their covariates that represent them? Introduction to a new geographic area (continent) with naive population, immunity in previously exposed populations, control measures to limit spread such as vaccination or destruction of vulnerable populations or flocks? Might those control measures be more or less likely depending on the country as a function of its resources and governance? There aren't globally available datasets that speak to those factors, so the question is not why were they omitted but rather was the authors decision to choose ENMs given their question justified? How valuable are insights based on patterns of correlation change when considering different temporal sets of HPAI cases in relation to a common and somewhat anachronistic set of covariates?

      (4) In general the study is somewhat incoherent with respect to time. Though the case data come from different time periods, each response dataset was modelled separately using exactly the same covariate dataset that predated both sets. That decision should be understood as a strong assumption on the part of the authors that conditions the interpretation: the world (as represented by the covariate set) is immutable, so the model has to return different correlative associations between the case data and the covariates to explain the new data. While the world represented by the selected covariates *may* be relatively stable (could be statistically confirmed), what about the world not represented by the covariates (see point 3)?

      References:

      Dhingra et al, 2016, Global mapping of highly pathogenic avian influenza H5N1 and H5Nx clade 2.3.4.4 viruses with spatial cross-validation, eLife 5, https://doi.org/10.7554/eLife.19571

      Elith, J., Kearney, M., & Phillips, S. (2010). The art of modelling range‐shifting species. Methods in Ecology and Evolution, 1(4), 330-342.

    4. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      We thank the Reviewers for their thorough attention to our paper and the interesting discussion about the findings. Before responding to more specific comments, here some general points we would like to clarify:

      (1) Ecological niche models are indeed correlative models, and we used them to highlight environmental factors associated with HPAI outbreaks within two host groups. We will further revise the terminology that could still unintentionally suggest causal inference. The few remaining ambiguities were mainly in the Discussion section, where our intent was to interpret the results in light of the broader scientific literature. Particularly, we will change the following expressions:

      -  “Which factors can explain…” to  “Which factors are associated with…” (line 75);

      -  “the environmental and anthropogenic factors influencing” to “the environmental and anthropogenic factors that are correlated with” (line 273);

      -  “underscoring the influence” to “underscoring the strong association” (line 282).

      (2) We respectfully disagree with the suggestion that an ecological niche modelling (ENM) approach is not appropriate for this work and the research question addressed therein. Ecological niche models are specifically designed to estimate the spatial distribution of the environmental suitability of species and pathogens, making them well suited to our research questions. In our study, we have also explicitly detailed the known limitations of ecological niche models in the Discussion section, in line with prior literature, to ensure their appropriate interpretation in the context of HPAI.

      (3) The environmental layers used in our models were restricted to those available at a global scale, as listed in Supplementary Information Resources S1(https://github.com/sdellicour/h5nx_risk_mapping/blob/master/Scripts_%26_data/SI_Resource_S1.xlsx ). Naturally, not all potentially relevant environmental factors could be included, but the selected layers are explicitly documented and only these were assessed for their importance. Despite this limitation, the performance metrics indicate that the models performed well, suggesting that the chosen covariates capture meaningful associations with HPAI occurrence at a global scale.

      Reviewer #1 (Public review):

      The authors aim to predict ecological suitability for transmission of highly pathogenic avian influenza (HPAI) using ecological niche models. This class of models identify correlations between the locations of species or disease detections and the environment. These correlations are then used to predict habitat suitability (in this work, ecological suitability for disease transmission) in locations where surveillance of the species or disease has not been conducted. The authors fit separate models for HPAI detections in wild birds and farmed birds, for two strains of HPAI (H5N1 and H5Nx) and for two time periods, pre- and post-2020. The authors also validate models fitted to disease occurrence data from pre-2020 using post-2020 occurrence data. I thank the authors for taking the time to respond to my initial review and I provide some follow-up below.

      Detailed comments:

      In my review, I asked the authors to clarify the meaning of "spillover" within the HPAI transmission cycle. This term is still not entirely clear: at lines 409-410, the authors use the term with reference to transmission between wild birds and farmed birds, as distinct to transmission between farmed birds. It is implied but not explicitly stated that "spillover" is relevant to the transmission cycle in farmed birds only. The sentence, "we developed separate ecological niche models for wild and domestic bird HPAI occurrences ..." could have been supported by a clear sentence describing the transmission cycle, to prime the reader for why two separate models were necessary.

      We respectfully disagree that the term “spillover” is unclear in the manuscript. In both the Methods and Discussion sections (lines 387-391 and 409-414), we explicitly define “spillover” as the introduction of HPAI viruses from wild birds into domestic poultry, and we distinguish this from secondary farm-to-farm transmission. Our use of separate ecological niche models for wild and domestic outbreaks reflects not only the distinction between primary spillover and secondary transmission, but also the fundamentally different ecological processes, surveillance systems, and management implications that shape outbreaks in these two groups. We will clarify this choice in the revised manuscript when introducing the separate models. Furthermore, on line 83, we will add “as these two groups are influenced by different ecological processes, surveillance biases, and management contexts”.

      I also queried the importance of (dead-end) mammalian infections to a model of the HPAI transmission risk, to which the authors responded: "While spillover events of HPAI into mammals have been documented, these detections are generally considered dead-end infections and do not currently represent sustained transmission chains. As such, they fall outside the scope of our study, which focuses on avian hosts and models ecological suitability for outbreaks in wild and domestic birds." I would argue that any infections, whether they are in dead-end or competent hosts, represent the presence of environmental conditions to support transmission so are certainly relevant to a niche model and therefore within scope. It is certainly understandable if the authors have not been able to access data of mammalian infections, but it is an oversight to dismiss these infections as irrelevant.

      We understand the Reviewer’s point, but our study was designed to model HPAI occurrence in avian hosts only. We therefore restricted our analysis to wild birds and domestic poultry, which represent the primary hosts for HPAI circulation and the focus of surveillance and control measures. While mammalian detections have been reported, they are outside the scope of this work.

      Correlative ecological niche models, including BRTs, learn relationships between occurrence data and covariate data to make predictions, irrespective of correlations between covariates. I am not convinced that the authors can make any "interpretation" (line 298) that the covariates that are most informative to their models have any "influence" (line 282) on their response variable. Indeed, the observation that "land-use and climatic predictors do not play an important role in the niche ecological models" (line 286), while "intensive chicken population density emerges as a significant predictor" (line 282) begs the question: from an operational perspective, is the best (e.g., most interpretable and quickest to generate) model of HPAI risk a map of poultry farming intensity?

      We agree that poultry density may partly reflect reporting bias, but we also assumed it a meaningful predictor of HPAI risk. Its importance in our models is therefore expected. Importantly, our BRT framework does more than reproduce poultry distribution: it captures non-linear relationships and interactions with other covariates, allowing a more nuanced characterisation of risk than a simple poultry density map. Note also that we distinguished in our models intensive and extensive chicken poultry density and duck density. Therefore, it is not a “map of poultry farming intensity”. 

      At line 282, we used the word “influence” while fully recognising that correlative models cannot establish causality. Indeed, in our analyses, “relative influence” refers to the importance metric produced by the BRT algorithm (Ridgeway, 2020), which measures correlative associations between environmental factors and outbreak occurrences. These scores are interpreted in light of the broader scientific literature, therefore our interpretations build on both our results and existing evidence, rather than on our models alone. However, in the next version of the paper, we will revise the sentence as: “underscoring the strong association of poultry farming practices with HPAI spread (Dhingra et al., 2016)”. 

      I have more significant concerns about the authors' treatment of sampling bias: "We agree with the Reviewer's comment that poultry density could have potentially been considered to guide the sampling effort of the pseudo-absences to consider when training domestic bird models. We however prefer to keep using a human population density layer as a proxy for surveillance bias to define the relative probability to sample pseudo-absence points in the different pixels of the background area considered when training our ecological niche models. Indeed, given that poultry density is precisely one of the predictors that we aim to test, considering this environmental layer for defining the relative probability to sample pseudo-absences would introduce a certain level of circularity in our analytical procedure, e.g. by artificially increasing to influence of that particular variable in our models." The authors have elected to ignore a fundamental feature of distribution modelling with occurrence-only data: if we include a source of sampling bias as a covariate and do not include it when we sample background data, then that covariate would appear to be correlated with presence. They acknowledge this later in their response to my review: "...assuming a sampling bias correlated with poultry density would result in reducing its effect as a risk factor." In other words, the apparent predictive capacity of poultry density is a function of how the authors have constructed the sampling bias for their models. A reader of the manuscript can reasonably ask the question: to what degree are is the model a model of HPAI transmission risk, and to what degree is the model a model of the observation process? The sentence at lines 474-477 is a helpful addition, however the preceding sentence, "Another approach to sampling pseudo-absences would have been to distribute them according to the density of domestic poultry," (line 474) is included without acknowledgement of the flow-on consequence to one of the key findings of the manuscript, that "...intensive chicken population density emerges as a significant predictor..." (line 282). The additional context on the EMPRES-i dataset at line 475-476 ("the locations of outbreaks ... are often georeferenced using place name nomenclatures") is in conflict with the description of the dataset at line 407 ("precise location coordinates"). Ultimately, the choices that the authors have made are entirely defensible through a clear, concise description of model features and assumptions, and precise language to guide the reader through interpretation of results. I am not satisfied that this is provided in the revised manuscript.

      We thank the Reviewer for this important point. To address it, we compared model predictive performance and covariate relative influences obtained when pseudo-absences were weighted by poultry density versus human population density (Author response table 1). The results show that differences between the two approaches are marginal, both in predictive performance (ΔAUC ranging from -0.013 to +0.002) and in the ranking of key predictors (see below Author response images 1 and 2). For instance, intensive chicken density consistently emerged as an important predictor regardless of the bias layer used.

      Note: the comparison was conducted using a simplified BRT configuration for computational efficiency (fewer trees, fixed 5-fold random cross-validation, and standardised parameters). Therefore, absolute values of AUC and variable importance may differ slightly from those in the manuscript, but the relative ranking of predictors and the overall conclusions remain consistent.

      Given these small differences, we retained the approach using human population density. We agree that poultry density partly reflects surveillance bias as well as true epidemiological risk, and we will clarify this in the revised manuscript by noting that the predictive role of poultry density reflects both biological processes and surveillance systems. Furthermore, on line 289, we will add “We note, however, that intensive poultry density may reflect both surveillance intensity and epidemiological risk, and its predictive role in our models should be interpreted in light of both processes”.

      Author response table 1.

      Comparison of model predictive performances (AUC) between pseudo-absence sampling were weighted by poultry density and by human population density across host groups, virus types, and time periods. Differences in AUC values are shown as the value for poultry-weighted minus human-weighted pseudo-absences.

      Author response image 1.

      Comparison of variable relative influence (%) between models trained with pseudo-absences weighted by poultry density (red) and human population density (blue) for domestic bird outbreaks. Results are shown for four datasets: H5N1 (<2020), H5N1 (>2020), H5Nx (<2020), and H5Nx (>2020).

      Author response image 2.

      Comparison of variable relative influence (%) between models trained with pseudo-absences weighted by poultry density (red) and human population density (blue) for wild bird outbreaks. Results are shown for three datasets: H5N1 (>2020), H5Nx (<2020), and H5Nx (>2020).

      The authors have slightly misunderstood my comment on "extrapolation": I referred to "environmental extrapolation" in my review without being particularly explicit about my meaning. By "environmental extrapolation", I meant to ask whether the models were predicting to environments that are outside the extent of environments included in the occurrence data used in the manuscript. The authors appear to have understood this to be a comment on geographic extrapolation, or predicting to areas outside the geographic extent included in occurrence data, e.g.: "For H5Nx post-2020, areas of high predicted ecological suitability, such as Brazil, Bolivia, the Caribbean islands, and Jilin province in China, likely result from extrapolations, as these regions reported few or no outbreaks in the training data" (lines 195-197). Is the model extrapolating in environmental space in these regions? This is unclear. I do not suggest that the authors should carry out further analysis, but the multivariate environmental similarly surface (MESS; see Elith et al., 2010) is a useful tool to visualise environmental extrapolation and aid model interpretation.

      On the subject of "extrapolation", I am also concerned by the additions at lines 362-370: "...our models extrapolate environmental suitability for H5Nx in wild birds in areas where few or no outbreaks have been reported. This discrepancy may be explained by limited surveillance or underreporting in those regions." The "discrepancy" cited here is a feature of the input dataset, a function of the observation distribution that should be captured in pseudo-absence data. The authors state that Kazakhstan and Central Asia are areas of interest, and that the environments in this region are outside the extent of environments captured in the occurrence dataset, although it is unclear whether "extrapolation" is informed by a quantitative tool like a MESS or judged by some other qualitative test. The authors then cite Australia as an example of a region with some predicted suitability but no HPAI outbreaks to date, however this discussion point is not linked to the idea that the presence of environmental conditions to support transmission need not imply the occurrence of transmission (as in the addition, "...spatial isolation may imply a lower risk of actual occurrences..." at line 214). Ultimately, the authors have not added any clear comment on model uncertainty (e.g., variation between replicated BRTs) as I suggested might be helpful to support their description of model predictions.

      Many thanks for the clarification. Indeed, we interpreted your previous comments in terms of geographic extrapolations. We thank the Reviewer for these observations. We will adjust the wording to further clarify that predictions of ecological suitability in areas with few or no reported outbreaks (e.g., Central Asia, Australia) are not model errors but expected extrapolations, since ecological suitability does not imply confirmed transmission (for instance, on Line 362: “our models extrapolate environmental suitability” will be changed to “Interestingly, our models extrapolate geographical”). These predictions indicate potential environments favorable to circulation if the virus were introduced.

      In our study, model uncertainty is formally assessed when comparing the predictive performances of our models (Fig. S3, Table S1), the relative influence (Table S3) and response curves (Fig. 2) associated with each environmental factor (Table S2). All the results confirming a good converge between these replicates. Finally, we indeed did not use a quantitative tool such as a MESS to assess extrapolation but did rely on qualitative interpretation of model outputs.

      All of my criticisms are, of course, applied with the understanding that niche modelling is imperfect for a disease like HPAI, and that data may be biased/incomplete, etc.: these caveats are common across the niche modelling literature. However, if language around the transmission cycle, the niche, and the interpretation of any of the models is imprecise, which I find it to be in the revised manuscript, it undermines all of the science that is presented in this work.

      We respectfully disagree with this comment. The scope of our study and the methods employed are clearly defined in the manuscript, and the limitations of ecological niche modelling in this context are explicitly acknowledged in the Discussion section. While we appreciate the Reviewer’s concern, the comment does not provide specific examples of unclear or imprecise language regarding the transmission cycle, niche, or interpretation of the models. Without such examples, it is difficult to identify further revisions that would improve clarity.

      Reviewer #2 (Public review):

      The geographic range of highly pathogenic avian influenza cases changed substantially around the period 2020, and there is much interest in understanding why. Since 2020 the pathogen irrupted in the Americas and the distribution in Asia changed dramatically. This study aimed to determine which spatial factors (environmental, agronomic and socio-economic) explain the change in numbers and locations of cases reported since 2020 (2020--2023). That's a causal question which they address by applying correlative environmental niche modelling (ENM) approach to the avian influenza case data before (2015--2020) and after 2020 (2020--2023) and separately for confirmed cases in wild and domestic birds. To address their questions they compare the outputs of the respective models, and those of the first global model of the HPAI niche published by Dhingra et al 2016.

      We do not agree with this comment. In the manuscript, it is well established that we are quantitatively assessing factors that are associated with occurrences data before and after 2020. We do not claim to determine the causality. One sentence of the Introduction section (lines 75-76) could be confusing, so we intend to modify it in the final revision of our manuscript. 

      ENM is a correlative approach useful for extrapolating understandings based on sparse geographically referenced observational data over un- or under-sampled areas with similar environmental characteristics in the form of a continuous map. In this case, because the selected covariates about land cover, use, population and environment are broadly available over the entire world, modelled associations between the response and those covariates can be projected (predicted) back to space in the form of a continuous map of the HPAI niche for the entire world.

      We fully agree with this assessment of ENM approaches.

      Strengths:

      The authors are clear about expected bias in the detection of cases, such geographic variation in surveillance effort (testing of symptomatic or dead wildlife, testing domestic flocks) and in general more detections near areas of higher human population density (because if a tree falls in a forest and there is no-one there, etc), and take steps to ameliorate those. The authors use boosted regression trees to implement the ENM, which typically feature among the best performing models for this application (also known as habitat suitability models). They ran replicate sets of the analysis for each of their model targets (wild/domestic x pathogen variant), which can help produce stable predictions. Their code and data is provided, though I did not verify that the work was reproducible.

      The paper can be read as a partial update to the first global model of H5Nx transmission by Dhingra and others published in 2016 and explicitly follows many methodological elements. Because they use the same covariate sets as used by Dhingra et al 2016 (including the comparisons of the performance of the sets in spatial cross-validation) and for both time periods of interest in the current work, comparison of model outputs is possible. The authors further facilitate those comparisons with clear graphics and supplementary analyses and presentation. The models can also be explored interactively at a weblink provided in text, though it would be good to see the model training data there too.

      The authors' comparison of ENM model outputs generated from the distinct HPAI case datasets is interesting and worthwhile, though for me, only as a response to differently framed research questions.

      Weaknesses:

      This well-presented and technically well-executed paper has one major weakness to my mind. I don't believe that ENM models were an appropriate tool to address their stated goal, which was to identify the factors that "explain" changing HPAI epidemiology.

      Here is how I understand and unpack that weakness:

      (1) Because of their fundamentally correlative nature, ENMs are not a strong candidate for exploring or inferring causal relationships.

      (2) Generating ENMs for a species whose distribution is undergoing broad scale range change is complicated and requires particular caution and nuance in interpretation (e.g., Elith et al, 2010, an important general assumption of environmental niche models is that the target species is at some kind of distributional equilibrium (at time scales relevant to the model application). In practice that means the species has had an opportunity to reach all suitable habitats and therefore its absence from some can be interpreted as either unfavourable environment or interactions with other species). Here data sets for the response (N5H1 or N5Hx case data in domestic or wild birds ) were divided into two periods; 2015--2020, and 2020--2023 based on the rationale that the geographic locations and host-species profile of cases detected in the latter period was suggestive of changed epidemiology. In comparing outputs from multiple ENMs for the same target from distinct time periods the authors are expertly working in, or even dancing around, what is a known grey area, and they need to make the necessary assumptions and caveats obvious to readers.

      We thank the Reviewer for this observation. First, we constrained pseudo-absence sampling to countries and regions where outbreaks had been reported, reducing the risk of interpreting non-affected areas as environmentally unsuitable. Second, we deliberately split the outbreak data into two periods (2015-2020 and 2020-2023) because we do not assume a single stable equilibrium across the full study timeframe. This division reflects known epidemiological changes around 2020 and allows each period to be modeled independently. Within each period, ENM outputs are interpreted as associations between outbreaks and covariates, not as equilibrium distributions. Finally, by testing prediction across periods, we assessed both niche stability and potential niche shifts. These clarifications will be added to the manuscript to make our assumptions and limitations explicit.

      Line 66, we will add: “Ecological niche model outputs for range-shifting pathogens must therefore be interpreted with caution (Elith et al., 2010). Despite this limitation, correlative ecological niche models  remain useful for identifying broad-scale associations and potential shifts in distribution. To account for this, we analysed two distinct time periods (2015-2020 and 2020-2023).”

      Line 123, we will revise “These findings underscore the ability of pre-2020 models in forecasting the recent geographic distribution of ecological suitability for H5Nx and H5N1 occurrences” to “These results suggest that pre-2020 models captured broad patterns of suitability for H5Nx and H5N1 outbreaks, while post-2020 models provided a closer fit to the more recent epidemiological situation”.

      (3) To generate global prediction maps via ENM, only variables that exist at appropriate resolution over the desired area can be supplied as covariates. What processes could influence changing epidemiology of a pathogen and are their covariates that represent them? Introduction to a new geographic area (continent) with naive population, immunity in previously exposed populations, control measures to limit spread such as vaccination or destruction of vulnerable populations or flocks? Might those control measures be more or less likely depending on the country as a function of its resources and governance? There aren't globally available datasets that speak to those factors, so the question is not why were they omitted but rather was the authors decision to choose ENMs given their question justified? How valuable are insights based on patterns of correlation change when considering different temporal sets of HPAI cases in relation to a common and somewhat anachronistic set of covariates?

      We agree that the ecological niche models trained in our study are limited to environmental and host factors, as described in the Methods section with the selection of predictors. While such models cannot capture causality or represent processes such as immunity, control measures, or governance, they remain a useful tool for identifying broad associations between outbreak occurrence and environmental context. Our study cannot infer the full mechanisms driving changes in HPAI epidemiology, but it does provide a globally consistent framework to examine how associations with available covariates vary across time periods.

      (4) In general the study is somewhat incoherent with respect to time. Though the case data come from different time periods, each response dataset was modelled separately using exactly the same covariate dataset that predated both sets. That decision should be understood as a strong assumption on the part of the authors that conditions the interpretation: the world (as represented by the covariate set) is immutable, so the model has to return different correlative associations between the case data and the covariates to explain the new data. While the world represented by the selected covariates *may* be relatively stable (could be statistically confirmed), what about the world not represented by the covariates (see point 3)?

      We used the same covariate layers for both periods, which indeed assumes that these environmental and host factors are relatively stable at the global scale over the short timeframe considered. We believe this assumption is reasonable, as poultry density, land cover, and climate baselines do not change drastically between 2015 and 2023 at the resolution of our analysis. We agree, however, that unmeasured processes such as control measures, immunity, or governance may have changed during this time and are not captured by our covariates.

      Recommendations for the Authors:

      Reviewer #1 (Recommendations for the authors):

      - Line 400-401: "over the 2003-2016 periods" has an extra "s"; "two host species" (with reference to wild and domestic birds) would be more precise as "two host groups".

      - Remove comma line 404

      Many thanks for these comments, we have modified the text accordingly.

      Reviewer #2 (Recommendations for the authors):

      Most of my work this round is encapsulated in the public part of the review.

      The authors responded positively to the review efforts from the previous round, but I was underwhelmed with the changes to the text that resulted. Particularly in regard to limiting assumptions - the way that they augmented the text to refer to limitations raised in review downplayed the importance of the assumptions they've made. So they acknowledge the significance of the limitation in their rejoinder, but in the amended text merely note the limitation without giving any sense of what it means for their interpretation of the findings of this study.

      The abstract and findings are essentially unchanged from the previous draft.

      I still feel the near causal statements of interpretation about the covariates are concerning. These models really are not a good candidate for supporting the inference that they are making and there seem to be very strong arguments in favour of adding covariates that are not globally available.

      We never claimed causal interpretation, and we have consistently framed our analyses in terms of associations rather than mechanisms. We acknowledge that one phrasing in the research questions (“Which factors can explain…”) could be misinterpreted, and we are correcting this in the revised version to read “Which factors are associated with…”. Our approach follows standard ecological niche modelling practice, which identifies statistical associations between occurrence data and covariates. As noted in the Discussion section, these associations should not be interpreted as direct causal mechanisms. Finally, all interpretive points in the manuscript are supported by published literature, and we consider this framing both appropriate and consistent with best practice in ecological niche modelling (ENM) studies.

      We assessed predictor contributions using the “relative influence” metric, the terminology reported by the R package “gbm” (Ridgeway, 2020). This metric quantifies the contribution of each variable to model fit across all trees, rescaled to sum to 100%, and should be interpreted as an association rather than a causal effect.

      L65-66 The general difficulty of interpreting ENM output with range-shifting species should be cited here to alert readers that they should not blithely attempt what follows at home.

      I believe that their analysis is interesting and technically very well executed, so it has been a disappointment and hard work to write this assessment. My rough-cut last paragraph of a reframed intro would go something like - there are many reasons in the literature not to do what we are about to do, but here's why we think it can be instructive and informative, within certain guardrails.

      To acknowledge this comment and the previous one, we revised lines 65-66 to: “However, recent outbreaks raise questions about whether earlier ecological niche models still accurately predict the current distribution of areas ecologically suitable for the local circulation of HPAI H5 viruses. Ecological niche model outputs for range-shifting pathogens must therefore be interpreted with caution (Elith et al., 2010). Despite this limitation, correlative ecological niche models  remain useful for identifying broad-scale associations and potential shifts in distribution.”

      We respectfully disagree with the Reviewer’s statement that “_there are many reasons in the literature not to do what we are about to do”._ All modeling approaches, including mechanistic ones, have limitations, and the literature is clear on both the strengths and constraints of ecological niche models. Our manuscript openly acknowledges these limits and frames our findings accordingly. We therefore believe that our use of an ENM approach is justified and contributes valuable insights within these well-defined boundaries.

      Reference: Ridgeway, G. (2007). Generalized Boosted Models: A guide to the gbm package. Update, 1(1), 2007.


      The following is the authors’ response to the original reviews.

      Reviewer #1(Public review):

      I am concerned by the authors' conceptualisation of "niche" within the manuscript. Is the "niche" we are modelling the niche of the pathogen itself? The niche of the (wild) bird host species as a group? The niche of HPAI transmission within (wild) bird host species (i.e., an intersection of pathogen and bird niches)? Or the niche of HPAI transmission in poultry? The precise niche being modelled should be clarified in the Introduction or early in the Methods of the manuscript. The first two definitions of niche listed above are relevant, but separate from the niche modelled in the manuscript - this should be acknowledged.

      We acknowledge that these concepts were probably not enough clearly defined in the previous version of our manuscript, and we have now included an explicit definition in the fourth paragraph of the Introduction section: “We developed separate ecological niche models for wild and domestic bird HPAI occurrences, these models thus predicting the ecological suitability for the risk of local viral circulation leading to the detection of HPAI occurrences within each host group (rather than the niche of the virus or the host species alone).”

      The authors should consider the precise transmission cycle involved in each HPAI case: "index cases" in farmed poultry, caused by "spillover" from wild birds, are relevant to the wildlife transmission cycle, while the ecological conditions coinciding with subsequent transmission in farmed poultry are likely to be fundamentally different. (For example, subsequent transmission is not conditional on the presence of wild birds.) Modelling these two separate, but linked, transmission cycles together may omit important nuances from the modelling framework.

      We thank the Reviewer for highlighting the distinction between primary (wild-todomestic) and secondary (farm-to-farm) transmission cycles. Our modelling framework was designed to assess the ecological suitability of HPAI occurrences in wild and domestic birds separately. In the domestic poultry models, the response variables are the confirmed outbreaks data and do not distinguish between index cases resulting from primary or secondary infections.

      One of the aims of the study is to evaluate the spatial distribution of areas ecologically suitable for local H5N1/x circulation either leading to domestic or wild bird cases, i.e. to identify environmental conditions where the virus may have persisted or spread, whether as a result of introduction by wild birds or farm-to-farm transmission. Introducing mechanistic distinctions in the response variable would not necessarily improve or affect the ecological suitability maps, since each type of transmission is likely to be associated with different covariates that are included in the models.

      Also, the EMPRES-i database does not indicate whether each record corresponds to an index case or a secondary transmission event, so in practice it would not be possible to produce two different models. However, we agree that distinguishing between types of transmission is an interesting perspective for future research. This could be explored, for example, by mapping interfaces between wild and domestic bird populations or by inferring outbreak transmission trees using genomic data when available.

      To avoid confusion, we now explicitly clarify this aspect in the Materials and Methods section: “It is important to note that the EMPRES-i database does not distinguish between index cases (e.g., primary spillover from wild birds) and secondary farm-to-farm transmissions. As such, our ecological niche models are trained on confirmed HPAI outbreaks in poultry that may result from different transmission dynamics — including both initial introduction events influenced by environmental factors and subsequent spread within poultry systems.”

      We now also address this limitation in the Discussion section: “Finally, our models for domestic poultry do not distinguish between primary introduction events (e.g., spillover from wild birds) and secondary transmission between farms due to limitations in the available surveillance data. While environmental factors likely influence the risk of initial spillover events, secondary spread is more often driven by anthropogenic factors such as biosecurity practices and poultry trade, which are not included in our current modelling framework.”

      The authors should clarify the meaning of "spillover" within the HPAI transmission cycle: if spillover transmission is from wild birds to farmed poultry, then subsequent transmission in poultry is separate from the wildlife transmission cycle. This is particularly relevant to the Discussion paragraph beginning at line 244: does "farm to farm transmission" have a distinct ecological niche to transmission between wild birds, and transmission between wild birds and farmed birds? And while there has been a spillover of HPAI to mammals, could the authors clarify that these detections are dead-end? And not represented in the dataset? Dhingra et al., 2016 comment on the contrast between models of "directly transmitted" pathogens, such as HPAI, and vector-borne diseases: for vector-borne diseases, "clear eco-climatic boundaries of vectors can be mapped", whereas "HPAI is probably not as strongly environmentally constrained". This is an important piece of nuance in their Discussion and a comment to a similar effect may be of use in this manuscript.

      Following the Reviewer’s previous comment, we have now added clarifications in the Methods and Discussion sections defining spillover as the transmission of HPAI viruses from wild birds to domestic poultry (index cases), and secondary transmission as onward spread between farms. As mentioned in our answer above, we now emphasise that our models do not distinguish these dynamics, which are likely to be influenced by different drivers — ecological in the case of spillover, and often anthropogenic (e.g., poultry trade movement, biosecurity) in the case of farm-to-farm transmission.

      The discussion regarding farm-to-farm transmission and spillovers is indeed an interpretation derived from the covariates analysis (see the second paragraph in the Discussion section). Specifically, we observed a stronger association between HPAI occurrences and domestic bird density after 2020, which may suggest that secondary infections (e.g., farm-to-farm transmission) became more prominent or more frequently reported. We however acknowledge that our data do not allow us to distinguish primary introductions from secondary transmission events, and we have added a sentence to explicitly clarify this: “However, this remains an interpretation, as the available data do not allow us to distinguish between index cases and secondary transmission events.”

      We thank the Reviewer for raising the point of mammalian infections. While spillover events of HPAI into mammals have been documented, these detections are generally considered dead-end infections and do not currently represent sustained transmission chains. As such, they fall outside the scope of our study, which focuses on avian hosts and models ecological suitability for outbreaks in wild and domestic birds. However, we agree that future work could explore the spatial overlap between mammalian outbreak detections and ecological suitability maps for wild birds to assess whether such spillovers may be linked to localised avian transmission dynamics.

      Finally, we have added a comment about the differences between pathogens strongly constrained by the environments and HPAI: “This suggests that HPAI H5Nx is not as strongly environmentally constrained as vector-borne pathogens, for which clear eco-climatic boundaries (e.g., vector borne diseases) can be mapped (Dhingra et al., 2016).” This aligns with the interpretation provided by Dhingra and colleagues (2016) and helps contextualise the predictive limitations of ecological niche models for directly transmitted pathogens like HPAI.

      There are several places where some simple clarification of language could answer my questions related to ecological niches. For example, on line 74, "the ecological niche" should be followed by "of the pathogen", or "of HPAI transmission in wild birds", or some other qualifier that is most appropriate to the Authors' conceptualisation of the niche modelled in the manuscript. Similarly, in the following sentence, "areas at risk" could be followed by "of transmission in wild birds", to make the transmission cycle that is the subject of modelling clear to the reader. On line 83, it is not clear who or what is the owner of "their ecological niches": is this "poultry and wild birds", or the pathogen?

      We agree with that suggestion and have now modified the related part of the text  accordingly (e.g., “areas at risk for local HPAI circulation” and “of HPAI in wild or domestic birds”).

      I am concerned by the authors' treatment of sampling bias in their BRT modelling framework. If we are modelling the niche of HPAI transmission, we would expect places that are more likely to be subject to disease surveillance to be represented in the set of locations where the disease has been detected. I do not agree that pseudo-absence points are sampled "to account for the lack of virus detection in some areas" - this description is misleading and does not match the following sentence ("pseudo-absence points sampled ... to reflect the greater surveillance efforts ..."). The distribution of pseudo-absences should aim to capture the distribution of probable disease surveillance, as these data act as a stand-in for missing negative surveillance records. It is sensible that pseudo-absences for disease detection in wild birds are sampled proportionately to human population density, as the disease is detected in dead wild birds, which are more likely to be identified close to areas of human occupation (as stated on line 163). However, I do not agree that the same applies to poultry - the density of farmed poultry is likely to be a better proxy for surveillance intensity in farmed birds. Human population density and farmed poultry density may be somewhat correlated (i.e., both are low in remote areas), but poultry density is likely to be higher in rural areas, which are assumed to have relatively lower surveillance intensity under the current approach. The authors allude to this in the Discussion: "monitoring areas with high intensive chicken densities ... remains crucial for the early detection and management of HPAI outbreaks".

      We agree with the Reviewer's comment that poultry density could have potentially been considered to guide the sampling effort of the pseudo-absences to consider when training domestic bird models. We however prefer to keep using a human population density layer as a proxy for surveillance bias to define the relative probability to sample pseudoabsence points in the different pixels of the background area considered when training our ecological niche models. Indeed, given that poultry density is precisely one of the predictors that we aim to test, considering this environmental layer for defining the relative probability to sample pseudo-absences would introduce a certain level of circularity in our analytical procedure, e.g. by artificially increasing to influence of that particular variable in our models.

      Furthermore, it is also worth noting that, to better account for variations in surveillance intensity, we also adjusted the sampling effort by allocating pseudo-absences in proportion to the number of confirmed outbreaks per administrative unit (country or sub-national regions for Russia and China). This approach aimed to reduce bias caused by uneven reporting and surveillance efforts between regions. Additionally, we restricted model training to countries or regions with a minimum surveillance threshold (at least five confirmed outbreaks per administrative unit). Therefore, both presence and pseudo-absence points originated from areas with more consistent surveillance data.

      We acknowledge in the Materials and Methods section that the approach proposed by the Reviewer could have been used: “Another approach to sampling pseudo-absences would have been to distribute them according to the density of domestic poultry.” Finally, our approach is also justified in our response to the next comment of the Reviewer.

      Having written my review, including the paragraph above, I briefly scanned Dhingra et al., and found that they provide justification for the use of human population density to sample pseudoabsences in farmed birds: "the Empres-i database compiles outbreak locations data from very heterogeneous sources and in the absence of explicit GPS location data, the geo-referencing of individual cases is often through the use of place name gazetteers that will tend to force the outbreak location populated place, rather in the exact location of the farm where the disease was found, which would introduce a bias correlated with human population density." This context is entirely missing from the manuscript under review, however, I maintain the comment in the paragraph above - have the Authors trialled sampling pseudo-absences from poultry density layers?

      We agree with the Reviewer’s comment and have now added this precision in the Materials and Methods section (in the third paragraph dedicated to ecological niche modelling): “However, as pointed out by Dhingra and colleagues (2016), the locations of outbreaks in the EMPRES-i database are often georeferenced using place name nomenclatures due to a lack of accurate GPS data, which could introduce a spatial bias towards populated areas.”

      The authors indirectly acknowledge the role of sampling bias in model predictions at line 163, however, this point could be clearer: there is sampling bias in the set of locations where HPAI has been observed and failure to adequately replicate this sampling bias in pseudo-absence data could lead covariates that are correlated with the observation distribution to appear to be correlated with the target distribution. This point is alluded to but should be clearly acknowledged to allow the reader to appropriately interpret your results. I understand the point being made on line 163 is that surveillance of HPAI in wild birds has become more structured and less opportunistic over time - if this is the case, a statement to this effect could replace "which could influence earlier data sets", which is a little ambiguous. The Authors acknowledge the role of sampling bias in lines 241-242 - this may be a good place to remind the reader that they have attempted to incorporate sampling bias through the selection of their pseudoabsence dataset, particularly for wild bird models.

      We thank the Reviewer for this comment. We have now clarified in the text that observed data on HPAI occurrence are inherently influenced by heterogeneous surveillance efforts and that failure to replicate this bias in pseudo-absence sampling could effectively lead to misleading correlations with covariates associated with surveillance effort rather than true ecological suitability. We have now rephrased the related sentence as follows: “This decline may indicate a reduced bias in observation data: typically, dead wild birds are more frequently found near human-populated areas due to opportunistic detections, whereas more recent surveillance efforts have become increasingly proactive (Giacinti et al., 2024).”

      Dhingra et al. aimed to account for the effect of mass vaccination of birds in China. This does not appear to be included in the updated models - is this a relevant covariate to consider in updated models? Are the models trained on pre-2020 data predicting to post-2020 given the same presence dataset as previous models? It may be helpful to provide a comment on this if we consider the pre-2020 models in this work to be representative of pre-2020 models as a cohort. Given the framing of the manuscript as an update to Dhingra et al., it may be useful for the authors to briefly summarise any differences between the existing models and updated models. Dhingra et al., also examine spatial extrapolation, which is not addressed here. Environmental extrapolation may be a useful metric to consider: are there areas where models are extrapolating that are predicted to be at high risk of HPAI transmission? Finally, they also provide some inset panels on global maps of model predictions - something similar here may also be useful.

      We thank the Reviewer for these comments. Vaccination coverage is indeed a relevant covariate for HPAI suitability in domestic birds. However, we did not include this variable in our updated models for two reasons. First, comprehensive vaccination data were only available for China, so it is not possible to include this variable in a global model. Second, available data were outdated and vaccination strategies can vary substantially over time.

      We however agree with the Reviewer that the Materials and Methods section did not clarify clearly the differences with Dhingra et al. (2016), and we now detail these differences at the beginning of the Materials and Methods section: “Our approach is similar to the one implemented by Dhingra and colleagues (2016). While Dhingra et al. (2016) developed their models only for domestic birds over the 2003-2016 periods, our models were developed for two host species separately (wild and domestic birds) and for two time periods (2016-2020 and 2020-2023).”

      We also detail the main difference concerning the pseudo-absences sampling:  Dhingra and colleagues (2016) used human population density to sample pseudo-absences to reflect potential surveillance bias and also account for spatial filtering (min/max distances from presence). We adopted a similar strategy but also incorporated outbreak count per country or province (in the case of China and Russia) into the pseudo-absence sampling process to further account for within-country surveillance heterogeneity. We have now added these specifications in the Materials and Methods section: “To account for heterogeneity in AIV surveillance and minimise the risk of sampling pseudo-absences in poorly monitored regions, we restricted our analysis to countries (or administrative level 1 units in China and Russia) with at least five confirmed outbreaks. Unlike Dhingra et al. (2016), who sampled pseudoabsences across a broader global extent, our sampling was limited to regions with demonstrated surveillance activity. In addition, we adjusted the density of pseudo-absence points according to the number of reported outbreaks in each country or admin-1 unit, as a proxy for surveillance effort — an approach not implemented in this previous study.”

      We have now also provided a comparison between the different outputs, particularly in the Results section: “Our findings were overall consistent with those previously reported by Dhingra and colleagues (Dhingra et al., 2016), who used data from January 2004 to March 2015 for domestic poultry. However, some differences were noted: their maps identified higher ecological suitability for H5 occurrences before 2016 in North America, West Africa, eastern Europe, and Bangladesh, while our maps mainly highlight ecologically suitable regions in China, South-East Asia, and Europe (Fig. S5). In India, analyses consistently identified high ecologically suitable areas for the risk of local H5Nx and H5N1 circulation for the three time periods (pre-2016, 2016-2020, and post-2020). Similar to the results reported by Dhingra and colleagues, we observed an increase in the ecological suitability estimated for H5N1 occurrence in South America's domestic bird populations post-2020. Finally, Dhingra and colleagues identified high suitability areas for H5Nx occurrence in North America, which are predicted to be associated with a low ecological suitability in the 2016-2020 models.”

      We acknowledge that some regions predicted as highly suitable correspond to areas where extrapolation likely occurs due to limited or no recorded outbreaks. We have now added these specifications when discussing the resulting suitability maps obtained for domestic birds: “For H5Nx post-2020, areas of high predicted ecological suitability, such as Brazil, Bolivia, the Caribbean islands, and Jilin province in China, likely result from extrapolations, as these regions reported few or no outbreaks in the training data”, and, for wild birds: “Some of the areas with high predicted ecological suitability reflect the result of extrapolations. This is particularly the case in coastal regions of West and North Africa, the Nile Basin, Central Asia (Kyrgyzstan, Tajikistan, Uzbekistan), Brazil (including the Amazon and coastal areas), southern Australia, and the Caribbean, where ecological conditions are similar to those in areas where outbreaks are known to occur but where records of outbreaks are still rare.”

      For wild birds (H5Nx, post-2020), high ecological suitability was predicted along the West and North African coasts, the Nile basin, Central Asia (e.g., Kyrgyzstan, Tajikistan, Uzbekistan), the Brazilian coast and Amazon region, Caribbean islands, southern Australia, and parts of Southeast Asia. Ecological suitability estimated in these regions may directly result from extrapolations and should therefore be interpreted cautiously.

      We also added a discussion of the extrapolation for wild birds (in the Discussion section): “Interestingly, our models extrapolate environmental suitability for H5Nx in wild birds in areas where few or no outbreaks have been reported. This discrepancy may be explained by limited surveillance or underreporting in those regions. For instance, there is significant evidence that Kazakhstan and Central Asia play a role as a centre for the transmission of avian influenza viruses through migratory birds (Amirgazin et al., 2022; FAO, 2005; Sultankulova et al., 2024). However, very few wild bird cases are reported in EMPRES-i. In contrast, Australia appears environmentally suitable in our models, yet no incursion of HPAI H5N1 2.3.4.4b has occurred despite the arrival of millions of migratory shorebirds and seabirds from Asia and North America. Extensive surveillance in 2022 and 2023 found no active infections nor evidence of prior exposure to the 2.3.4.4b lineage (Wille et al., 2024; Wille and Klaassen, 2023).”

      We agree that inset panels can be helpful for visualising global patterns. However, all resulting maps are available on the MOOD platform (https://app.mood-h2020.eu/core), which provides an interactive interface allowing users to zoom in and out, identify specific locations using a background map, and explore the results in greater detail. This resource is referenced in the manuscript to guide readers to the platform.

      Related to my review of the manuscript's conceptualisation above, there are several inconsistencies in terminology in the manuscript - clearing these up may help to make the methods and their justification clearer to the reader. The "signal" that the models are estimating is variously described as "susceptibility" and "risk" (lines 179-180), "HPAI H5 ecological suitability" (line 78), "likelihood of HPAI occurrences" (line 139), "risk of HPAI circulation" (line 187), "distribution of occurrence data" (line 428). Each of these quantities has slightly different meanings and it is confusing to the reader that all of these descriptors are used for model output. "Likelihood of HPAI occurrences" is particularly misleading: ecological niche models predict high suitability for a species in areas that are similar to environments where it has previously been identified, without imposing constraints on species movement. It is intuitively far more likely that there will be HPAI occurrences in areas where the disease is already established than in areas where an introduction event is required, however, the niche models in this work do not include spatial relationships in their predictions.

      We agree with the Reviewer’s comments. We have now modified the text so that in the Results section we refer to ecological suitability when referring to the outputs of the models. In the context of our Discussion section, we then interpret this ecological suitability in terms of risk, as areas with high ecological suitability being more likely to support local HPAI outbreaks.

      I also caution the authors in their interpretation of the results of BRTs, which are correlative models, so therefore do not tell us what causes a response variable, but rather what is correlated with it. On Line 31, "correlated with" may be more appropriate than "influenced by". On Line 82, "correlated with" is more appropriate than "driving". This is particularly true given the authors' treatment of sampling bias.

      We agree with the Reviewer’s comment and have now rephrased these sentences as follows: “The spatial distribution of HPAI H5 occurrences in wild birds appears to be primarily correlated with urban areas and open water regions” and “Our results provide a better understanding of HPAI dynamics by identifying key environmental factors correlated with the increase in H5Nx and H5N1 cases in poultry and wild birds, investigating potential shifts in their ecological niches, and improving the prediction of at-risk areas.”

      The following sentences in line 201 are ambiguous: "For both H5Nx and H5N1, however, isolated areas on the risk map should be interpreted with caution. These isolated areas may result from sparse data, model limitations, or local environmental conditions that may not accurately reflect true ecological suitability." By "isolated", do the authors mean remote? Or ecologically dissimilar from the set of locations where HPAI has been detected? Or ecologically dissimilar from the set of locations in the joint set of HPAI detection locations and pseudo-absences? Or ecologically similar to the set of locations where HPAI has been detected but spatially isolated? These four descriptors are each slightly different and change the meaning of the sentences. "Model limitations" are also ambiguous - could the authors clarify which specific model limitations they are referring to here? Ultimately, the point being made is probably that a model may predict high ecological suitability for HPAI transmission in areas where the disease has not yet been identified, or where a model is extrapolating in environmental space, however, uncertainty in these predictions may be greater than uncertainty in predictions in areas that are represented in surveillance data. A clear comment on model uncertainty and how it is related to the surveillance dataset and the covariate dataset is currently missing from the manuscript and would be appropriate in this paragraph.

      We understand the Reviewer’s concerns regarding these potential ambiguities, and have now rephrased these sentences as follows: “For both H5Nx and H5N1, certain areas of predicted high ecological suitability appear spatially isolated, i.e. surrounded by regions of low predicted ecological suitability. These areas likely meet the environmental conditions associated with past HPAI occurrences, but their spatial isolation may imply a lower risk of actual occurrences, particularly in the absence of nearby outbreaks or relevant wild bird movements.”

      I am concerned by the wording of the following sentence: "The risk maps reveal that high-risk areas have expanded after 2020" (line 203). This statement could be supported by an acknowledgement of the assumptions the models make of the HPAI niche: are we saying that the niche is unchanged in environmental space and that there are now more geographic areas accessible to the pathogen, or that the niche has shifted or expanded, and that there are now more geographic areas accessible to the pathogen? The authors should review the sentence beginning on line 117: if models trained on data from the old timepoint predicting to the new timepoint are almost as good as models trained on data from the new timepoint predicting to the new timepoint, doesn't this indicate that the niche, as the models are able to capture it, has not changed too much?

      We thank the Reviewer for this comment. The statement that "high-risk areas have expanded after 2020" indeed refers to an increase in the geographic extent of areas predicted to have high ecological suitability in models trained on post-2020 data. This expansion likely reflects new outbreak data from regions that had not previously reported cases, which in turn influenced model training.

      However, models trained on pre-2020 data retain reasonable predictive performance when applied to post-2020 data (see the AUC results reported in Table S1), suggesting that the models suggest an expansion in the ecological suitability, but do not provide definitive evidence of a shift in the ecological niche. We have now added a statement at the end of this paragraph to clarify this point: “However, models trained on pre-2020 data maintained reasonable predictive performance when tested on post-2020 data, suggesting that the overall ecological niche of HPAI did not drastically shift over time.”

      The final two paragraphs of the Results might be more helpful to include at the beginning of the Results, as the data discussed there are inputs to the models. Is it possible that the "rise in Shannon index for sea birds" that "suggests a broadening of species diversity within this category from 2020 onwards" is caused by the increasingly structured surveillance of HPAI in wild birds alluded to earlier in the Results? Is the "prevalence" discussed in line 226 the frequency of the families Laridae and Sulidae being represented in HPAI detection data? Or the abundance of the bird species themselves? The language here is a little ambiguous. Discussion of particular values of Shannon/Simpson indices is slightly out of context as the meanings of the indices are in the Methods - perhaps a brief explanation of the uses of Shannon/Simpson indices may be helpful to the reader here. It may also be helpful to readers who are not acquainted with avian taxonomy to provide common names next to formal names (for example, in brackets) in the body of the text, as this manuscript is published in an interdisciplinary journal.

      We thank the Reviewer for these comments. First, we acknowledge that the paragraphs on species diversity and Shannon/Simpson indices describe important data, but we have chosen to present them after the main modelling results in order to maintain a logical narrative flow. Our manuscript first presents the ecological niche models and their predictive performance, followed by interpretations of the observed patterns, including changes in avian host diversity. Diversity indices were used primarily to support and contextualise the patterns observed in the modelling results.

      For clarity, we have revised the relevant paragraphs in the Results (i) to briefly remind readers of the interpretation of the Shannon and Simpson indices (“Note that these indices reflect the diversity of bird species detected in outbreak records, not necessarily their abundance in the wild”) and (ii) to clarify that “prevalence” refers to the frequency of HPAI detection in wild bird species of the Laridae (gulls) and Sulidae (boobies and gannets) families, and not their total abundance. Family of birds includes several species, so the “common name” of a family can sometimes refer to species from other families. We have now added the common names for each family in the manuscript (even if we indeed acknowledge that “penguins” can be ambiguous).

      In the Methods, it is stated: "To address the heterogeneity of AIV surveillance efforts and to avoid misclassifying low-surveillance areas as unsuitable for virus circulation, we trained the ecological niche models only considering countries in which five or more cases have been confirmed." However, it is not clear how this processing step prevents low-surveillance areas from being misclassified. If pseudo-absences are appropriately sampled, low-surveillance areas should be less represented in the pseudo-absence dataset, which should lead the models to be uncertain in their predictions of these areas. Perhaps "To address the heterogeneity of AIV surveillance efforts and to avoid sampling pseudo-absence data in realistically low-surveillance areas" is a more accurate introduction to the paragraph. I am not entirely convinced that it is appropriate to remove detection data where the national number of cases is low. This may introduce further sampling bias into the dataset.

      We take the opportunity of the Reviewer’s comment to further clarify this important step aiming to mitigate bias associated with countries with substantial uncertainty in reporting and/or potentially insufficient HPAI surveillance data. While we indeed acknowledge that this procedure may exclude countries that had effective surveillance but low virus detection, we argue that it constitutes a relevant conservative approach to minimising the risk of sampling a significant number of pseudo-absence points in areas associated with relatively high yet undetected local HPAI circulation due to insufficient surveillance. Furthermore, given that five cases over two decades is a relatively low threshold — particularly for a highly transmissible virus such as AIV — non-detection or non-reporting remains a more plausible explanation than true absence.

      To improve clarity, we have now revised the related sentence as follows: “To account for heterogeneity in AIV surveillance and minimise the risk of sampling pseudo-absences in poorly monitored regions, we restricted our analysis to countries (or administrative level 1 units in China and Russia) with at least five confirmed outbreaks.”

      The reporting of spatial and temporal resolution of data in the manuscript could be significantly clearer. Is there a reason why human population density is downscaled to 5 arcminutes (~10km at the equator) while environmental covariate data has a resolution of 1km? The projection used is not reported. The authors should clarify the time period/resolution of the covariate data assigned to the occurrence dataset, for example, does "day LST annual mean" represent a particular year pre- or post-2020? Or an average over a number of years? Given that disease detections are associated with observation and reporting dates, and that there may be seasonal patterns in HPAI occurrence, it would be helpful to the reader to include this information when the eco-climatic indices are described. It would also be helpful to the reader to summarise the source, spatial and temporal resolution of all covariates in a table, as in Dhingra et al. Could the Authors clarify whether the duck density layer is farmed ducks or wild ducks?

      The projection is WGS 84 (EPSG:4326) and the resolution of the output maps is around 0.0833 x 0.0833 decimal degrees (i.e. 5 arcmin, or approximately 10 km at the equator). We have now added these specifications in the text: “All maps are in a WGS84 projection with a spatial resolution of 0.0833 decimal degrees (i.e. 5 arcmin, or approximately 10 km at the equator).” In addition, we have now specified in the text that duck refers to domestic duck for clarity. 

      Environmental variables retrieved for our analyses were here available as values averaged over distinct periods of time (for further detail see Supplementary Information Resources S1 — description and source of each environmental variable included in the original sets of variables — available at https://github.com/sdellicour/h5nx_risk_mapping). In future works, this would indeed be interesting to associate the occurrences to a specific season with the variables accordingly, specially for viruses such as HPAI which have been found correlated with seasons. However, we did not conduct this type of analysis in the present study, occurrences being here associated with averaged values of environmental data only.

      In line 407, the authors state a number of pseudo-absence points used in modelling, relative to the number of presence points, without clear justification. Note that relative weights can be assigned to occurrence data in most ECN software (e.g., R package gbm), to allow many pseudo-absence points to be sampled to represent the full extent of probable surveillance effort and subsequently down-weighted.

      We thank the Reviewer for this suggestion. We acknowledge that alternative approaches such as down-weighting pseudo-absence points could offer a certain degree of flexibility in representing surveillance effort. However, we opted for a fixed 1:3 ratio of pseudoabsences to presence points within each administrative unit to ensure a consistent and conservative sampling distribution. This approach aimed to limit overrepresentation of pseudoabsences in areas with sparse presence data, while still reflecting areas of likely surveillance.

      There are a number of typographical errors and phrasing issues in the manuscript. A nonexhaustive list is provided below.

      - Line 21: "its" should be "their" - Line 25: "HPAI cases"

      Modifications have been done.

      - Line 63: sentence beginning "However" is somewhat out of context - what is it (briefly) about recent outbreaks that challenge existing models?

      We have now edited that sentence as follows: “However, recent outbreaks raise questions about whether earlier ecological niche models still accurately predict the current distribution of areas ecologically suitable for the local circulation of HPAI H5 viruses.”

      - Lines 71 and 390: "AIV" is not defined in the text - Line 73: "do" ("are" and "what" are not capitalised)

      Modifications have been done.

      - Line 115: "predictability" should be "predictive capacity"

      We have now replaced “predictability” by “predictive performance”.

      - Line 180: omit "pinpointing"

      - Line 192 sentence beginning "In India," should be re-worded: is the point that there are detections of HPAI here and the model predicts high ecological suitability?

      - Line 195 sentence beginning "Finally," phrasing could be clearer: Dhingra et al. find high suitability areas for H5Nx in North America which are predicted to be low suitability in the new model.

      - Line 237: omit "the" in "with the those"

      - Line 374: missing "."

      - Line 375: "and" should be "to" (the same goes for line 421)

      - Line 448: Rephrase "Simpson index goes" to "The Simpson index ranges"

      Modifications have been done.

      Reviewer #2 (Public Review):

      What is the justification for separating the dataset at 2020? Is it just the gap in-between the avian influenza outbreaks?

      We chose 2020 as a cut-off based on a well-documented shift in HPAI epidemiology, notably the emergence and global spread of clade 2.3.4.4b, which may affect host dynamics and geographic patterns. We have now added this precision in the Materials and Methods section: “We selected 2020 as a cut-off point to reflect a well-documented shift in HPAI epidemiology, notably the emergence and global spread of clade 2.3.4.4b. This event marked a turning point in viral dynamics, influencing both the range of susceptible hosts and the geographical distribution of outbreaks.”

      If the analysis aims to look at changing case numbers and distribution over time, surely the covariate datasets should be contemporaneous with the response?

      Thank you for raising this important point. While we acknowledge that, ideally, covariates should match the response temporally, such high-resolution spatiotemporal environmental data were not available for most environmental factors considered in our ecological niche modelling analyses. While we used predictors (e.g., land-use variables, poultry density) that reflect long-term ecological suitability, we acknowledge that rather considering short-term seasonal variation could be an interesting perspective in future works, which is now explicitly stated in the Discussion section: “In addition, aligning outbreak occurrences with seasonally matched environmental variables could further refine predictions of HPAI risk linked to migratory dynamics.”

      I would expect quite different immunity dynamics between domestic and wild birds as a function of lifespan and birth rates - though no obvious sign of that in the raw data. A statement on assumptions in that respect would be good.

      Thank you for the comment. We agree that domestic and wild birds likely exhibit different immunity dynamics due to differences in lifespan, turnover rates, and exposure. However, our analyses did not explicitly model immunity processes, and the data did not show a clear signal of these differences.

      Decisions and analytical tactics from Dhingra et al are adopted here in a way that doesn't quite convey the rationale, or justify its use here.

      We thank the Reviewer for this observation. However, we do not agree with the notion that the rationale for using Dhingra et al.’s analytical framework is insufficiently conveyed. We adapted key components of their ecological niche modelling approach — such as the use of a boosted regression tree methodology and pseudo-absences sampling procedure — to ensure comparability with their previous findings, while also extending the analysis to additional time periods and host categories (wild vs. domestic birds). This framework aligns with the main objective of our study, which is to assess shifts in ecological suitability for HPAI over time and across host species, in light of changing viral dynamics.  

      Please go over the manuscript and harmonise the language about the model target - it is usually referred to as cases, but sometimes the pathogen, and others the wild and domestic birds where the cases were discovered.

      We agree and we have now modified the text to only use the “cases” or “occurrences” terminology when referring to the model inputs.

      Is the reporting of your BRT implementation correct? The text suggests that only 10 trees were run per replicate (of which there were 10 per response (domestic/wild x H5N1 / H5Nx) x distinct covariate set), but this would suggest that the authors were scarcely benefiting from the 'boosting' part of the BRTs that allow them to accurately estimate curvilinear functions. As additional trees are added, they should still be improving the loss function, and dramatically so in the early stages. The authors seem heavily guided by Elith et al's excellent paper[1] explaining BRTs and the companion tutorial piece, but in that work, the recommended approach is to run an initial model with a relatively quick learning rate that achieves the best fit to the held-out data at somewhere over 1000 trees, and then to refine the model to that number of trees with a slower learning rate. If the authors did indeed run only 10 trees I think that should be explained.

      For each model, we used the “gbm.step” function to fit boosted regression trees, initiating the process with 10 trees and allowing up to 10,000 trees in steps of 5. The optimal number of trees was automatically determined by minimising the cross-validated deviance, following the recommended approach of Elith and colleagues (2008, J. Anim. Ecol.). This setup allows the boosting algorithm to iteratively improve model performance while avoiding overfitting. These aspects are now further clarified in the Materials and Methods section: “All BRT analyses were run and averaged over 10 cross-validated replicates, with a tree complexity of 4, a learning rate of 0.01, a tolerance parameter of 0.001, and while considering 5 spatial folds. Each model was initiated with 10 trees, and additional trees were incrementally added (in steps of 5) up to a maximum of 10,000, with the optimal number selected based on cross-validation tests.”

      I'm uncomfortable with the strong interpretation of changes in indices such as those for diversity in the case of bird species with detected cases of avian influenza, and the relative influence of covariates in the environmental niche models. In the former case, if surveillance effort is increasing it might be expected that more species will be found to be infected. In the latter, I'm just not convinced that these fundamentally correlative models can support the interpretation of changing epidemiology as asserted by authors. This strikes me as particularly problematic in light of static and in some cases anachronistic predictor sets.

      We thank the Reviewer for drawing attention to how changes in surveillance intensity might influence our diversity estimates. We have now integrated a new analysis to evaluate the increase in the number of wild birds tested and discussed the potential impact of this increase on the comparison of the bird species diversity metrics presented in our study, which is now interpreted with more caution: “To evaluate whether the post-2020 increase in species diversity estimated for infected wild birds could result from an increase in the number of tests performed on wild birds, we compared European annual surveillance test counts (EFSA et al., 2025, 2019) before and after 2020 using a Wilcoxon rank-sum test. We relied on European data because it was readily accessible and offered standardised and systematically collected metrics across multiple years, making it suitable for a comparative analysis. Although borderline significant (p-value = 0.063), the Wilcoxon rank-sum test indeed highlighted a recent increase in the number of wild bird tests (on average >11,000/year pre-2020 and >22,000 post-2020), which indicates that the comparison of bird species diversity metrics should be interpreted with caution. However, such an increase in the number of tests conducted in the context of a passive surveillance framework would thus also be in line with an increase in the number of wild birds found dead and thus tested. Therefore, while the increase in the number of tests could indeed impact species diversity metrics such as the Shannon index, it can also reflect an absolute higher wild bird mortality in line with a broadened range of infected bird species.”

    1. Tentative dates

      Hi Chris,

      I think we should add a calendar option for selecting dates (similar to flight booking) instead of just writing out the months. The tentative dates are flexible, but they’ll likely be close to the actual days. Plus, this approach would give a more professional appearance. Please check with Nash and proceed if it makes sense.

      Thanks.

    1. eLife Assessment

      This important manuscript presents a thorough analysis of the evolution of Major Histocompatibility Complex gene families across Primates. A key strength of this analysis is the use of state-of-the-art phylogenetic methods to estimate rates of gene gain and loss, accounting for the notorious difficulty to properly assemble MHC genomic regions. Overall the evidence for the authors' conclusions -- that there is considerable diversity in how MHC diversity is deployed across species -- are compelling.

    2. Joint Public Review:

      Summary:

      The Major Histocompatibility Complex (MHC) region is a collection of numerous genes involved in both innate and adaptive immunity. MHC genes are famed for their role in rapid evolution and extensive polymorphism in a variety of vertebrates. This paper presents a summary of gene-level gain and loss of orthologs and paralogs within MHC across the diversity of primates, using publicly available data.

      Strengths:

      This paper provides a strong case that MHC genes are rapidly gained (by paralog duplication) and lost over millions of years of macroevolution. The authors are able to identify MHC loci by homology across species, and from this infer gene duplications and losses using phylogenetic analyses. There is a remarkable amount of genic turnover, summarized in Figure 6 and Figure 7, either of which might be a future textbook figure of immune gene family evolution. The authors draw on state-of-the-art phylogenetic methods, and their inferences are robust.

      Editorial note:

      The authors have responded to the previous reviews and the Assessment was updated without involving the reviewers again.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The Major Histocompatibility Complex (MHC) region is a collection of numerous genes involved in both innate and adaptive immunity. MHC genes are famed for their role in rapid evolution and extensive polymorphism in a variety of vertebrates. This paper presents a summary of gene-level gain and loss of orthologs and paralogs within MHC across the diversity of primates, using publicly available data.

      Strengths:

      This paper provides a strong case that MHC genes are rapidly gained (by paralog duplication) and lost over millions of years of macroevolution. The authors are able to identify MHC loci by homology across species, and from this infer gene duplications and losses using phylogenetic analyses. There is a remarkable amount of genic turnover, summarized in Figure 6 and Figure 7, either of which might be a future textbook figure of immune gene family evolution. The authors draw on state-of-the-art phylogenetic methods, and their inferences are robust insofar as the data might be complete enough to draw such conclusions.

      Weaknesses:

      One concern about the present work is that it relies on public databases to draw inferences about gene loss, which is potentially risky if the publicly available sequence data are incomplete. To say, for example, that a particular MHC gene copy is absent in a taxon (e.g., Class I locus F absent in Guenons according to Figure 1), we need to trust that its absence from the available databases is an accurate reflection of its absence in the genome of the actual organisms. This may be a safe assumption, but it rests on the completeness of genome assembly (and gene annotations?) or people uploading relevant data. This reviewer would have been far more comfortable had the authors engaged in some active spot-checking, doing the lab work to try to confirm absences at least for some loci and some species. Without this, a reader is left to wonder whether gene loss is simply reflecting imperfect databases, which then undercuts confidence in estimates of rates of gene loss.

      Indeed, just because a locus has not been confirmed in a species does not necessarily mean that it is absent. As we explain in the Figure 1 caption, only a few species have had their genomes extensively studied (gray background), and only for these species does the absence of a point in this figure mean that a locus is absent. The white background rows represent species that are not extensively studied, and we point out that the absence of a point does not mean that a locus is absent from the species, rather undiscovered. We have also added a parenthetical to the text to explain this (line 156): “Only species with rows highlighted in gray have had their MHC regions extensively studied (and thus only for these rows is the absence of a gene symbol meaningful).”

      While we agree that spot-checking may be a helpful next step, one of the goals of this manuscript is to collect and synthesize the enormous volume of MHC evolution research in the primates, which will serve as a jumping-off point for other researchers to perform important wet lab work.

      Some context is useful for comparing rates of gene turnover in MHC, to other loci. Changing gene copy numbers, duplications, and loss of duplicates, are common it seems across many loci and many organisms; is MHC exceptional in this regard, or merely behaving like any moderately large gene family? I would very much have liked to see comparable analyses done for other gene families (immune, like TLRs, or non-immune), and quantitative comparisons of evolutionary rates between MHC versus other genes. Does MHC gene composition evolve any faster than a random gene family? At present readers may be tempted to infer this, but evidence is not provided.

      Our companion paper (Fortier and Pritchard, 2025) demonstrates that the MHC is a unique locus in many regards, such as its evidence for deep balancing selection and its excess of disease associations. Thus, we expect that it is evolving faster than any random gene family. It would be interesting to repeat this analysis for other gene families, but that is outside of the scope of this project. Additionally, allele databases for other gene families are not nearly as developed, but as more alleles become available for other polymorphic families, a comparable analysis could become possible.

      We have added a paragraph to the discussion (lines 530-546) to clarify that we do not know for certain whether the MHC gene family is evolving rapidly compared to other gene families.

      While on the topic of making comparisons, the authors make a few statements about relative rates. For instance, lines 447-8 compare gene topology of classical versus non-classical genes; and line 450 states that classical genes experience more turnover. But there are no quantitative values given to these rates to provide numerical comparisons, nor confidence intervals provided (these are needed, given that they are estimates), nor formal statistical comparisons to confirm our confidence that rates differ between types of genes.

      More broadly, the paper uses sophisticated phylogenetic methods, but without taking advantage of macroevolutionary comparative methods that allow model-based estimation of macroevolutionary rates. I found the lack of quantitative measurements of rates of gene gain/loss to be a weakness of the present version of the paper, and something that should be readily remedied. When claiming that MHC Class I genes "turn over rapidly" (line 476) - what does rapidly mean? How rapidly? How does that compare to rates of genetic turnover at other families? Quantitative statements should be supported by quantitative estimates (and their confidence intervals).

      These statements refer to qualitative observations, so we cannot provide numerical values. We simply conclude that certain gene groups evolve faster or slower based on the species and genes present in each clade. It is difficult to provide estimates because of the incomplete sampling of genes that survived to the present day. In addition, the presence or absence of various orthologs in different species still needs to be confirmed, at which point it might be useful to be more quantitative. We have also added a paragraph to the discussion to address this concern and advocate for similar analyses of other gene families in the future when more data is available (lines 530-546).

      The authors refer to 'shared function of the MHC across species' (e.g. line 22); while this is likely true, they are not here presenting any functional data to confirm this, nor can they rule out neofunctionalization or subfunctionalization of gene duplicates. There is evidence in other vertebrates (e.g., cod) of MHC evolving appreciably altered functions, so one may not safely assume the function of a locus is static over long macroevolutionary periods, although that would be a plausible assumption at first glance.

      Indeed, we cannot assume that the function of a locus is static across time, especially for the MHC region. In our research, we read hundreds of papers that each focused on a small number of species or genes and gathered some information about them, sometimes based on functional experiments and sometimes on measures such as dN/dS. These provide some indication of a gene’s broad classification in a species or clade, even if the evidence is preliminary. Where possible, we used this preliminary evidence to give genes descriptors “classical,” “non-classical,” “dual characteristics,” “pseudogene,” “fixed”, or “unfixed.” Sometimes multiple individuals and haplotypes were analyzed, so we could even assign a minimum number of gene copies present in a species. We have aggregated all of these references into Supplementary Table 1 (for Class I/Figure 1) and Supplementary Table 2 (for Class II/Figure 2) along with specific details about which data points in these figures that each reference supports. We realize that many of these classifications are based on a small number of individuals or indirect measures, so they may change in the future as more functional data is generated.

      Reviewer #2 (Public review):

      Summary:

      The authors aim to provide a comprehensive understanding of the evolutionary history of the Major Histocompatibility Complex (MHC) gene family across primate species. Specifically, they sought to:

      (1) Analyze the evolutionary patterns of MHC genes and pseudogenes across the entire primate order, spanning 60 million years of evolution.

      (2) Build gene and allele trees to compare the evolutionary rates of MHC Class I and Class II genes, with a focus on identifying which genes have evolved rapidly and which have remained stable.

      (3) Investigate the role of often-overlooked pseudogenes in reconstructing evolutionary events, especially within the Class I region.

      (4) Highlight how different primate species use varied MHC genes, haplotypes, and genetic variation to mount successful immune responses, despite the shared function of the MHC across species.

      (5) Fill gaps in the current understanding of MHC evolution by taking a broader, multi-species perspective using (a) phylogenomic analytical computing methods such as Beast2, Geneconv, BLAST, and the much larger computing capacities that have been developed and made available to researchers over the past few decades, (b) literature review for gene content and arrangement, and genomic rearrangements via haplotype comparisons.

      (6) The authors overall conclusions based on their analyses and results are that 'different species employ different genes, haplotypes, and patterns of variation to achieve a successful immune response'.

      Strengths:

      Essentially, much of the information presented in this paper is already well-known in the MHC field of genomic and genetic research, with few new conclusions and with insufficient respect to past studies. Nevertheless, while MHC evolution is a well-studied area, this paper potentially adds some originality through its comprehensive, cross-species evolutionary analysis of primates, focus on pseudogenes and the modern, large-scale methods employed. Its originality lies in its broad evolutionary scope of the primate order among mammals with solid methodological and phylogenetic analyses.

      The main strengths of this study are the use of large publicly available databases for primate MHC sequences, the intensive computing involved, the phylogenetic tool Beast2 to create multigene Bayesian phylogenetic trees using sequences from all genes and species, separated into Class I and Class II groups to provide a backbone of broad relationships to investigate subtrees, and the presentation of various subtrees as species and gene trees in an attempt to elucidate the unique gene duplications within the different species. The study provides some additional insights with summaries of MHC reference genomes and haplotypes in the context of a literature review to identify the gene content and haplotypes known to be present in different primate species. The phylogenetic overlays or ideograms (Figures 6 and 7) in part show the complexity of the evolution and organisation of the primate MHC genes via the orthologous and paralogous gene and species pathways progressively from the poorly-studied NWM, across a few moderately studied ape species, to the better-studied human MHC genes and haplotypes.

      Weaknesses:

      The title 'The Primate Major Histocompatibility Complex: An Illustrative Example of GeneFamily Evolution' suggests that the paper will explore how the Major Histocompatibility Complex (MHC) in primates serves as a model for understanding gene family evolution. The term 'Illustrative Example' in the title would be appropriate if the paper aimed to use the primate Major Histocompatibility Complex (MHC) as a clear and representative case to demonstrate broader principles of gene family evolution. That is, the MHC gene family is not just one instance of gene family evolution but serves as a well-studied, insightful example that can highlight key mechanisms and concepts applicable to other gene families. However, this is not the case, this paper only covers specific details of primate MHC evolution without drawing broader lessons to any other gene families. So, the term 'Illustrative Example' is too broad or generalizing. In this case, a term like 'Case Study' or simply 'Example' would be more suitable. Perhaps, 'An Example of Gene Family Diversity' would be more precise. Also, an explanation or 'reminder' is suggested that this study is not about the origins of the MHC genes from the earliest jawed vertebrates per se (~600 mya), but it is an extension within a subspecies set that has emerged relatively late (~60 mya) in the evolutionary divergent pathways of the MHC genes, systems, and various vertebrate species.

      Thank you for your input on the title; we have changed it to “A case study of gene family evolution” instead.

      Thank you also for pointing out the potential confusion about the time span of our study. We have added “Having originated in the jawed vertebrates,” to a sentence in the introduction (lines 38-39). We have also added the sentence “Here, we focus on the primates, spanning approximately 60 million years within the over 500-million-year evolution of the family \citep{Flajnik2010}.“ to be more explicit about the context for our work (lines 59-61).

      Phylogenomics. Particular weaknesses in this study are the limitations and problems associated with providing phylogenetic gene and species trees to try and solve the complex issue of the molecular mechanisms involved with imperfect gene duplications, losses, and rearrangements in a complex genomic region such as the MHC that is involved in various effects on the response and regulation of the immune system. A particular deficiency is drawing conclusions based on a single exon of the genes. Different exons present different trees. Which are the more reliable? Why were introns not included in the analyses? The authors attempt to overcome these limitations by including genomic haplotype analysis, duplication models, and the supporting or contradictory information available in previous publications. They succeed in part with this multidiscipline approach, but much is missed because of biased literature selection. The authors should include a paragraph about the benefits and limitations of the software that they have chosen for their analysis, and perhaps suggest some alternative tools that they might have tried comparatively. How were problems with Bayesian phylogeny such as computational intensity, choosing probabilities, choosing particular exons for analysis, assumptions of evolutionary models, rates of evolution, systemic bias, and absence of structural and functional information addressed and controlled for in this study?

      We agree that different exons have different trees, which is exactly why we repeated our analysis for each exon in order to compare and contrast them. In particular, the exons encoding the binding site of the resulting protein (exons 2 and 3 for Class I and exon 2 for Class II) show evidence for trans-species polymorphism and gene conversion. These phenomena lead to trees that do not follow the species tree and are fascinating in and of themselves, which we explore in detail in our companion paper (Fortier and Pritchard, 2025). Meanwhile, the non-peptide-binding extracellular-domain-encoding exon (exon 4 for Class I and exon 3 for Class II) is comparably sized to the binding-site-encoding exons and provides an interesting functional contrast. As this exon is likely less affected by trans-species polymorphism, gene conversion, and convergent evolution, we present results from it most often in the main text, though we occasionally touch on differences between the exons. See lines 191-196, 223-226, and 407-414 for some examples of how we discuss the exons in the text. Additionally, all trees from all of these exons can be found in the supplement. 

      We agree that introns would valuable to study in this context. Even though the non--binding-site-encoding exons are probably *less* affected by trans-species polymorphism, gene conversion, and convergent evolution, they are still functional. The introns, however, experience much more relaxed selection, if any, and comparing their trees to those for the exons would be valuable and illuminating. We did not generate intron trees for two reasons. Most importantly, there is a dearth of data available for the introns; in the databases we used, there was often intron data available only for human, chimpanzee, and sometimes macaque, and only for a small subset of the genes. This limitation is at odds with the comprehensive, many-gene-many-species approach which we feel is the main novelty of this work. Secondly, the introns that *are* available are difficult to align. Even aligning the exons across such a highly-diverged set of genes and pseudogenes was difficult and required manual effort. The introns proved even more difficult to try to align across genes. In the future, when more intron data is available and sufficient effort is put into aligning them, it will be possible and desirable to do a comparable analysis. We also added a sentence to the “Data” section to briefly explain why we did not include introns (lines 134-135).

      We explain our Bayesian phylogenetics approach in detail in the Methods (lines 650-725), including our assumptions and our solutions to challenges specific to this application. For further explanation of the method itself, we suggest reading the original BEAST and BEAST2 papers (Drummond & Rambaut (2007), Drummond et al. (2012), Bouckaert et al. (2014), and Bouckaert et al. (2019)). Known structural and functional information helped us validate the alignments we used in this study, but the fact that such information is not fully known for every gene and species should not affect the method itself.

      Gene families as haplotypes. In the Introduction, the MHC is referred to as a 'gene family', and in paragraph 2, it is described as being united by the 'MHC fold', despite exhibiting 'very diverse functions'. However, the MHC region is more accurately described as a multigene region containing diverse, haplotype-specific Conserved Polymorphic Sequences, many of which are likely to be regulatory rather than protein-coding. These regulatory elements are essential for controlling the expression of multiple MHC-related products, such as TNF and complement proteins, a relationship demonstrated over 30 years ago. Non-MHC fold loci such as TNF, complement, POU5F1, lncRNA, TRIM genes, LTA, LTB, NFkBIL1, etc, are present across all MHC haplotypes and play significant roles in regulation. Evolutionary selection must act on genotypes, considering both paternal and maternal haplotypes, rather than on individual genes alone. While it is valuable to compile databases for public use, their utility is diminished if they perpetuate outdated theories like the 'birth-and-death model'. The inclusion of prior information or assumptions used in a statistical or computational model, typically in Bayesian analysis, is commendable, but they should be based on genotypic data rather than older models. A more robust approach would consider the imperfect duplication of segments, the history of their conservation, and the functional differences in inheritance patterns. Additionally, the MHC should be examined as a genomic region, with ancestral haplotypes and sequence changes or rearrangements serving as key indicators of human evolution after the 'Out of Africa' migration, and with disease susceptibility providing a measurable outcome. There are more than 7000 different HLA-B and -C alleles at each locus, which suggests that there are many thousands of human HLA haplotypes to study. In this regard, the studies by Dawkins et al (1999 Immunol Rev 167,275), Shiina et al. (2006 Genetics 173,1555) on human MHC gene diversity and disease hitchhiking (haplotypes), and Sznarkowska et al. (2020 Cancers 12,1155) on the complex regulatory networks governing MHC expression, both in terms of immune transcription factor binding sites and regulatory non-coding RNAs, should be examined in greater detail, particularly in the context of MHC gene allelic diversity and locus organization in humans and other primates.

      Thank you for these comments. To clarify that the MHC “region” is different from (and contains) the MHC “gene family” as we describe it, we changed a sentence in the abstract (lines 8-10) from “One large gene family that has experienced rapid evolution is the Major Histocompatibility Complex (MHC), whose proteins serve critical roles in innate and adaptive immunity.” to “One large gene family that has experienced rapid evolution lies within the Major Histocompatibility Complex (MHC), whose proteins serve critical roles in innate and adaptive immunity.” We know that the region is complex and contains many other genes and regulatory sequences; Figure 1 of our companion paper (Fortier and Pritchard, 2025) depicts these in order to show the reader that the MHC genes we focus on are just one part of the entire region.

      We love the suggestion to look at the many thousands of alleles present at each of the classical loci. This is the focus of our complimentary paper (Fortier and Pritchard, 2025) which explores variation at the allele level. In the current paper, we look mainly at the differences between genes and the use of different genes in different species.

      Diversifying and/or concerted evolution. Both this and past studies highlight diversifying selection or balancing selection model is the dominant force in MHC evolution. This is primarily because the extreme polymorphism observed in MHC genes is advantageous for populations in terms of pathogen defence. Diversification increases the range of peptides that can be presented to T cells, enhancing the immune response. The peptide-binding regions of MHC genes are highly variable, and this variability is maintained through selection for immune function, especially in the face of rapidly evolving pathogens. In contrast, concerted evolution, which typically involves the homogenization of gene duplicates through processes like gene conversion or unequal crossing-over, seems to play a minimal role in MHC evolution. Although gene duplication events have occurred in the MHC region leading to the expansion of gene families, the resulting paralogs often undergo divergent evolution rather than being kept similar or homozygous by concerted evolution. Therefore, unlike gene families such as ribosomal RNA genes or histone genes, where concerted evolution leads to highly similar copies, MHC genes display much higher levels of allelic and functional diversification. Each MHC gene copy tends to evolve independently after duplication, acquiring unique polymorphisms that enhance the repertoire of antigen presentation, rather than undergoing homogenization through gene conversion. Also, in some populations with high polymorphism or genetic drift, allele frequencies may become similar over time without the influence of gene conversion. This similarity can be mistaken for gene conversion when it is simply due to neutral evolution or drift, particularly in small populations or bottlenecked species. Moreover, gene conversion might contribute to greater diversity by creating hybrids or mosaics between different MHC genes. In this regard, can the authors indicate what percentage of the gene numbers in their study have been homogenised by gene conversion compared to those that have been diversified by gene conversion?

      We appreciate the summary, and we feel we have appropriately discussed both gene conversion and diversifying selection in the context of the MHC genes. Because we cannot know for sure when and where gene conversion has occurred, we cannot quantify percentages of genes that have been homogenized or diversified.  

      Duplication models. The phylogenetic overlays or ideograms (Figures 6 and 7) show considerable imperfect multigene duplications, losses, and rearrangements, but the paper's Discussion provides no in-depth consideration of the various multigenic models or mechanisms that can be used to explain the occurrence of such events. How do their duplication models compare to those proposed by others? For example, their text simply says on line 292, 'the proposed series of events is not always consistent with phylogenetic data'. How, why, when? Duplication models for the generation and extension of the human MHC class I genes as duplicons (extended gene or segmental genomic structures) by parsimonious imperfect tandem duplications with deletions and rearrangements in the alpha, beta, and kappa blocks were already formulated in the late 1990s and extended to the rhesus macaque in 2004 based on genomic haplotypic sequences. These studies were based on genomic sequences (genes, pseudogenes, retroelements), dot plot matrix comparisons, and phylogenetic analyses of gene and retroelement sequences using computer programs. It already was noted or proposed in these earlier 1999 studies that (1) the ancestor of HLA-P(90)/-T(16)/W(80) represented an old lineage separate from the other HLA class I genes in the alpha block, (2) HLA-U(21) is a duplicated fragment of HLA-A, (3) HLA-F and HLA-V(75) are among the earliest (progenitor) genes or outgroups within the alpha block, (4) distinct Alu and L1 retroelement sequences adjoining HLA-L(30), and HLA-N genomic segments (duplicons) in the kappa block are closely related to those in the HLA-B and HLA-C in the beta block; suggesting an inverted duplication and transposition of the HLA genes and retroelements between the beta and kappa regions. None of these prior human studies were referenced by Fortier and Pritchard in their paper. How does their human MHC class I gene duplication model (Fig. 6) such as gene duplication numbers and turnovers differ from those previously proposed and described by Kulski et al (1997 JME 45,599), (1999 JME 49,84), (2000 JME 50,510), Dawkins et al (1999 Immunol Rev 167,275), and Gaudieri et al (1999 GR 9,541)? Is this a case of reinventing the wheel?

      Figures 6 and 7 are intended to synthesize and reconcile past findings and our own trees, so they do not strictly adhere to the findings of any particular study and cannot fully match all studies. In the supplement, Figure 6 - figure supplement 1 and Figure 7 - figure supplement 1 duly credit all of the past work that went into making these trees. Most previous papers focus on just one aspect of these trees, such as haplotypes within a species, a specific gene or allelic lineage relationship, or the branching pattern of particular gene groups. We believe it was necessary to bring all of these pieces of evidence together. Even among papers with the same focus (to understand the block duplications that generated the current physical layout of the MHC), results differ. For example, Geraghty (1992), Hughes (1995), Kulski (2004)/Kulski (2005),  and Shiina (1999) all disagree on the exact branching order of the genes MHC-W, -P, and -T, and of MHC-G, -J, and -K. While the Kulski studies you pointed out were very thorough for their era, they still only relied on data from three species and one haplotype per species. Our work is not intended to replace or discredit these past works, simply build upon them with a larger set of species and sequences. We hope the hypotheses we propose in Figures 6 and 7 can help unify existing research and provide a more easily accessible jumping-off-point for future work.

      Results. The results are presented as new findings, whereas most if not all of the results' significance and importance already have been discussed in various other publications. Therefore, the authors might do better to combine the results and discussion into a single section with appropriate citations to previously published findings presented among their results for comparison. Do the trees and subsets differ from previous publications, albeit that they might have fewer comparative examples and samples than the present preprint? Alternatively, the results and discussion could be combined and presented as a review of the field, which would make more sense and be more honest than the current format of essentially rehashing old data.

      In starting this project, we found that a large barrier to entry to this field of study is the immense amount of published literature over 30+ years. It is both time-consuming and confusing to read up on the many nuances of the MHC genes, their changing names, and their evolution, making it difficult to start new, innovative projects. We acknowledge that while our results are not entirely novel, the main advantage of our work is that it provides a thorough, comprehensive starting point for others to learn about the MHC quickly and dive into new research. We feel that we have appropriately cited past literature in both the main text, appendices, and supplement, so that readers may dive into a particular area with ease.

      Minor corrections:

      (1) Abstract, line 19: 'modern methods'. Too general. What modern methods?

      To keep the abstract brief, the methods are introduced in the main text when each becomes relevant as well as in the methods section.

      (2) Abstract, line 25: 'look into [primate] MHC evolution.' The analysis is on the primate MHC genes, not on the entire vertebrate MHC evolution with a gene collection from sharks to humans. The non-primate MHC genes are often differently organised and structurally evolved in comparison to primate MHC.

      Thank you! We have added the word “primate” to the abstract (line 25).

      (3) Introduction, line 113. 'In a companion paper (Fortier and Pritchard, 2024)' This paper appears to be unpublished. If it's unpublished, it should not be referenced.

      This paper is undergoing the eLife editorial process at the same time; it will have a proper citation in the final version.

      (4) Figures 1 and 2. Use the term 'gene symbols' (circle, square, triangle, inverted triangle, diamond) or 'gene markers' instead of 'points'. 'Asterisks "within symbols" indicate new information.

      Thank you, the word “symbol” is much clearer! We have changed “points” to “symbols” in the captions for Figure 1, Figure 1 - figure supplement 1, Figure 2, and Figure 2 - figure supplement 1. We also changed this in the text (lines 157-158 and 170).

      (5) Figures. A variety of colours have been applied for visualisation. However, some coloured texts are so light in colour that they are difficult to read against a white background. Could darker colours or black be used for all or most texts?

      With such a large number of genes and species to handle in this work, it was nearly impossible to choose a set of colors that were distinct enough from each other. We decided to prioritize consistency (across this paper, its supplement, and our companion paper) as well as at-a-glance grouping of similar sequences. Unfortunately, this means we had to sacrifice readability on a white background, but readers may turn to the supplement if they need to access specific sequence names.

      (6) Results, line 135. '(Fortier and Pritchard, 2024)' This paper appears to be unpublished. If it's unpublished, it should not be referenced.

      Repeat of (3). This paper is undergoing the eLife editorial process at the same time; it will have a proper citation in the final version.

      (7) Results, lines 152 to 153, 164, 165, etc. 'Points with an asterisk'. Use the term 'gene symbols' (circle, square, triangle, inverted triangle, diamond) or 'gene markers' instead of 'points'. A point is a small dot such as those used in data points for plotting graphs .... The figures are so small that the asterisks in the circles, squares, triangles, etc, look like points (dots) and the points/asterisks terminology that is used is very confusing visually.

      Repeat of (4). Thank you, the word “symbol” is much clearer! We have changed “points” to “symbols” in the captions for Figure 1, Figure 1 - figure supplement 1, Figure 2, and Figure 2 - figure supplement 1. We also changed this in the text (lines 157-158 and 170).

      (8) Line 178 (BEA, 2024) is not listed alphabetically in the References.

      Thank you for catching this! This reference maps to the first bibliography entry, “SUMMARIZING POSTERIOR TREES.” We are unsure how to cite a webpage that has no explicit author within the eLife Overleaf template, so we will consult with the editor.

      (9) Lines 188-190. 'NWM MHC-G does not group with ape/OWM MHC-G, instead falling outside of the clade containing ape/OWM MHC-A, -G, -J and -K.' This is not surprising given that MHC-A, -G, -J, and -K are paralogs of each other and that some of them, especially in NWM have diverged over time from the paralogs and/or orthologs and might be closer to one paralog than another and not be an actual ortholog of OWM, apes or humans.

      We included this sentence to clarify the relationships between genes and to help describe what is happening in Figure 6. Figure 6 - figure supplement 1 includes all of the references that go into such a statement and Appendix 3 details our reasoning for this and other statements.

      (10) Line 249. Gene conversion: This is recombination between two different genes where a portion of the genes are exchanged with one another so that different portions of the gene can group within one or other of the two gene clades. Alternatively, the gene has been annotated incorrectly if the gene does not group within either of the two alternative clades. Another possibility is that one or two nucleotide mutations have occurred without a recombination resulting in a mistaken interpretation or conclusion of a recombination event. What measures are taken to avoid false-positive conclusions? How many MHC gene conversion (recombination) events have occurred according to the authors' estimates? What measures are taken to avoid false-positive conclusions?

      All of these possibilities are certainly valid. We used the program GENECONV to infer gene conversion events, but there is considerable uncertainty owing to the ages of the genes and the inevitable point mutations that have occurred post-event. Gene conversion was not the focus of our paper, so we did our best to acknowledge it (and the resulting differences between trees from different exons) without spending too much time diving into it. A list of inferred gene conversion events can be found in Figure 3 - source data 1 and Figure 4 - source data 1.

      (11) Lines 284-286. 'The Class I MHC region is further divided into three polymorphic blocks-alpha, beta, and kappa blocks-that each contains MHC genes but are separated by well-conserved non-MHC genes.' The MHC class I region was first designated into conserved polymorphic duplication blocks, alpha and beta by Dawkins et al (1999 Immunol Rev 167,275), and kappa by Kulski et al (2002 Immunol Rev 190,95), and should be acknowledged (cited) accordingly.

      Thank you for catching this! We have added these citations (lines 302-303)!

      (12) Lines 285-286. 'The majority of the Class I genes are located in the alpha-block, which in humans includes 12 MHC genes and pseudogenes.' This is not strictly correct for many other species, because the majority of class I genes might be in the beta block of new and old-world monkeys, and the authors haven't provided respective counts of duplication numbers to show otherwise. The alpha block in some non-primate mammalian species such as pigs, rats, and mice has no MHC class I genes or only a few. Most MHC class I genes in non-primate mammalian species are found in other regions. For example, see Ando et al (2005 Immunogenetics 57,864) for the pig alpha, beta, and kappa regions in the MHC class I region. There are no pig MHC genes in the alpha block.

      Yes, which is exactly why we use the phrase “in humans” in that particular sentence. The arrangement of the MHC in several other primate reference genomes is shown in Figure 1 - figure supplement 2.

      (13) Line 297 to 299. 'The alpha-block also contains a large number of repetitive elements and gene fragments belonging to other gene families, and their specific repeating pattern in humans led to the conclusion that the region was formed by successive block duplications (Shiina et al., 1999).' There are different models for successive block duplications in the alpha block and some are more parsimonious based on imperfect multigenic segmental duplications (Kulski et al 1999, 2000) than others (Shiina et al., 1999). In this regard, Kulski et al (1999, 2000) also used duplicated repetitive elements neighbouring MHC genes to support their phylogenetic analyses and multigenic segmental duplication models. For comparison, can the authors indicate how many duplications and deletions they have in their models for each species?

      We have added citations to this sentence to show that there are different published models to describe the successive block duplications (line 307). Our models in Figure 6 and Figure 7 are meant to aggregate past work and integrate our own, and thus they were not built strictly by parsimony. References can be found in Figure 6 - figure supplement 1 and Figure 7 - figure supplement 1.

      (14) Lines 315-315. 'Ours is the first work to show that MHC-U is actually an MHC-A-related gene fragment.' This sentence should be deleted. Other researchers had already inferred that MHC-U is actually an MHC-A-related gene fragment more than 25 years ago (Kulski et al 1999, 2000) when the MHC-U was originally named MHC-21.

      While these works certainly describe MHC-U/MHC-21 as a fragment in the 𝛼-block, any relation to MHC-A was by association only and very few species/haplotypes were examined. So although the idea is not wholly novel, we provide convincing evidence that not only is MHC-U related to MHC-A by sequence, but also that it is a very recent partial duplicate of MHC-A. We show this with Bayesian phylogenetic trees as well as an analysis of haplotypes across many more species than were included in those papers.  

      (15) Lines 361-362. 'Notably, our work has revealed that MHC-V is an old fragment.' This is not a new finding or hypothesis. Previous phylogenetic analysis and gene duplication modelling had already inferred HLA-V (formerly HLA-75) to be an old fragment (Kulski et al 1999, 2000).

      By “old,” we mean older than previous hypotheses suggest. Previous work has proposed that MHC-V and -P were duplicated together, with MHC-V deriving from an MHC-A/H/V ancestral gene and MHC-P deriving from an MHC-W/T/P ancestral gene (Kulski (2005), Shiina (1999)). However, our analysis (Figure 5A) shows that MHC-V sequences form a monophyletic clade outside of the MHC-W/P/T group of genes as well as outside of the MHC-A/B/C/E/F/G/J/K/L group of genes, which is not consistent with MHC-A and -V being closely related. Thus, we conclude that MHC-V split off earlier than the differentiation of these other gene groups and is thus older than previously thought. We explain this in the text as well (lines 317-327) and in Appendix 3.  

      (16) Line 431-433. 'the Class II genes have been largely stable across the mammals, although we do see some lineage-specific expansions and contractions (Figure 2 and Figure 2-gure Supplement 2).' Please provide one or two references to support this statement. Is 'gure' a typo?

      We corrected this typo, thank you! This conclusion is simply drawn from the data presented in Figure 2 and Figure 2 - figure supplement 2. The data itself comes from a variety of sources, which are already included in the supplement as Figure 2 - source data 1.

      (17) Line 437. 'We discovered far more "specific" events in Class I, while "broad-scale" events were predominant in Class II.' Please define the difference between 'specific' and 'broad-scale'.

      These terms are defined in the previous sentence (lines 466-469).

      450-451. 'This shows that classical genes experience more turnover and are more often affected by long-term balancing selection or convergent evolution.' Is balancing selection a form of divergent evolution that is different from convergent evolution? Please explain in more detail how and why balancing selection or convergent evolution affects classical and nonclassical genes differently.

      Balancing selection acts to keep alleles at moderate frequencies, preventing any from fixing in the population. In contrast, convergent evolution describes sequences or traits becoming similar over time even though they are not similar by descent. While we cannot know exactly what selective forces have occurred in the past, we observe different patterns in the trees for each type of gene. In Figures 1 and 2, viewers can see at first glance that the nonclassical genes (which are named throughout the text and thoroughly described in Appendix 3) appear to be longer-lived than the classical genes. In addition, lines 204-222 and 475-488 describe topological differences in the BEAST2 trees of these two types of genes. However, we acknowledge that it could be helpful to have additional, complimentary information about the classical vs. non-classical genes. Thus, we have added a sentence and reference to our companion paper (Fortier and Pritchard, 2025), which focuses on long-term balancing selection and draws further contrast between classical and non-classical genes. In lines 481-484, we added  “We further explore the differences between classical and non-classical genes in our companion paper, finding ancient trans-species polymorphism at the classical genes but not at the non-classical genes \citep{Fortier2025b}.”

      References

      Some references in the supplementary materials such as Alvarez (1997), Daza-Vamenta (2004), Rojo (2005), Aarnink (2014), Kulski (2022), and others are missing from the Reference list. Please check that all the references in the text and the supplementary materials are listed correctly and alphabetically.

      We will make sure that these all show up properly in the proof.

      Reviewer #3 (Public review):

      Summary:

      The article provides the most comprehensive overview of primate MHC class I and class II genes to date, combining published data with an exploration of the available genome assemblies in a coherent phylogenetic framework and formulating new hypotheses about the evolution of the primate MHC genomic region.

      Strengths:

      I think this is a solid piece of work that will be the reference for years to come, at least until population-scale haplotype-resolved whole-genome resequencing of any mammalian species becomes standard. The work is timely because there is an obvious need to move beyond short amplicon-based polymorphism surveys and classical comparative genomic studies. The paper is data-rich and the approach taken by the authors, i.e. an integrative phylogeny of all MHC genes within a given class across species and the inclusion of often ignored pseudogenes, makes a lot of sense. The focus on primates is a good idea because of the wealth of genomic and, in some cases, functional data, and the relatively densely populated phylogenetic tree facilitates the reconstruction of rapid evolutionary events, providing insights into the mechanisms of MHC evolution. Appendices 1-2 may seem unusual at first glance, but I found them helpful in distilling the information that the authors consider essential, thus reducing the need for the reader to wade through a vast amount of literature. Appendix 3 is an extremely valuable companion in navigating the maze of primate MHC genes and associated terminology.

      Weaknesses:

      I have not identified major weaknesses and my comments are mostly requests for clarification and justification of some methodological choices.

      Thank you so much for your kind and supportive review!

      Reviewer #1 (Recommendations for the authors):

      (1) Line 151: How is 'extensively studied' defined?

      Extensively studied is not a strict definition, but a few organisms clearly stand apart from the rest in terms of how thoroughly their MHC regions have been studied. For example, the macaque is a model organism, and individuals from many different species and populations have had their MHC regions fully sequenced. This is in contrast to the gibbon, for example, in which there is some experimental evidence for the presence of certain genes, but no MHC region has been fully sequenced from these animals.

      (2) Can you clarify how 'classical' and 'non-classical' MHC genes are being determined in your analysis?

      Classical genes are those whose protein products perform antigen presentation to T cells and are directly involved in adaptive immunity, while non-classical genes are those whose protein products do not do this. For example, these non-classical genes might code for proteins that interact with receptors on Natural Killer cells and influence innate immunity. The roles of these proteins are not necessarily conserved between closely related species, and experimental evidence is needed to evaluate this. However, in the absence of such evidence, wherever possible we have provided our best guess as to the roles of the orthologous genes in other species, presented in Figure 1 - source data 1 and Figure 2 - source data 1. This is based on whatever evidence is available at the moment, sometimes experimental but typically based on dN/dS ratios and other indirect measures.

      (3) I find the overall tone of the paper to be very descriptive, and at times meandering and repetitive, with a lot of similar kinds of statements being repeated about gene gain/loss. This is perhaps inevitable because a single question is being asked of each of many subsets of MHC gene types, and even exons within gene types, so there is a lot of repetition in content with a slightly different focus each time. This does not help the reader stay focused or keep track. I found myself wishing for a clearly defined question or hypothesis, or some rate parameter in need of estimation. I would encourage the authors to tighten up their phrasing, or consider streamlining the results with some better signposting to organize ideas within the results.

      We totally understand your critique, as we talk about a wide range of specific genes and gene groups in this paper. To improve readability, we have added many more signposting phrases and sentences:

      “Aside from MHC-DRB, …” (line 173)

      “Now that we had a better picture of the landscape of MHC genes present in different primates, we wanted to understand the genes’ relationships. Treating Class I, Class IIA, and Class IIB separately, ...” (line 179-180)

      “We focus first on the Class I genes.” (line 191)

      “... for visualization purposes…” (line195)

      “We find that sequences do not always assort by locus, as would be expected for a typical gene.” (lines 196-197)

      “... rather than being directly orthologous to the ape/OWM MHC-G genes.” (lines 201-202)

      “Appendix 3 explains each of these genes in detail, including previous work and findings from this study.“ (lines 202-203)

      “... (but not with NWM) …” (line 208)

      “While genes such as MHC-F have trees which closely match the overall species tree, other genes show markedly different patterns, …” (lines 212-213)

      “Thus, while some MHC-G duplications appear to have occurred prior to speciation events within the NWM, others are species-specific.” (lines 218-219)

      “... indicating rapid evolution of many of the Class I genes” (lines 220-221)

      “Now turning to the Class II genes, …“ (line 223)

      “(see Appendix 2 for details on allele nomenclature) “ (line 238)

      “(e.g. MHC-DRB1 or -DRB2)” (line 254)

      “...  meaning their names reflect previously-observed functional similarity more than evolutionary relatedness.” (lines 257-258)

      “(see Appendix 3 for more detail)” (line 311)

      “(a 5'-end fragment)” (line 324)

      “Therefore, we support past work that has deemed MHC-V an old fragment.” (lines 326-327)

      “We next focus on MHC-U, a previously-uncharacterized fragment pseudogene containing only exon 3.” (line 328-329)

      “However, it is present on both chimpanzee haplotypes and nearly all human haplotypes, and we know that these haplotypes diverged earlier---in the ancestor of human and gorilla. Therefore, ...” (lines 331-333)

      “Ours is the first work to show that MHC-U is actually an MHC-A-related gene fragment and that it likely originated in the human-gorilla ancestor.” (lines 334-336)  

      “These pieces of evidence suggest that MHC-K and -KL duplicated in the ancestor of the apes.” (lines 341-342)

      “Another large group of related pseudogenes in the Class I $\alpha$-block includes MHC-W, -P, and -T (see Appendix 3 for more detail).” (lines 349-350)

      “...to form the current physical arrangement” (lines 354)

      “Thus, we next focus on the behavior of this subgroup in the trees.” (line 358)

      “(see Appendix 3 for further explanation).” (line 369)

      “Thus, for the first time we show that there must have been three distinct MHC-W-like genes in the ape/OWM ancestor.” (lines 369-371)

      “... and thus not included in the previous analysis. ” (lines 376-377)

      “MHC-Y has also been identified in gorillas (Gogo-Y) (Hans et al., 2017), so we anticipate that Gogo-OLI will soon be confirmed. This evidence suggests that the MHC-Y and -OLI-containing haplotype is at least as old as the human-gorilla split. Our study is the first to place MHC-OLI in the overall story of MHC haplotype evolution“ (lines 381-384)

      “Appendix 3 explains the pieces of evidence leading to all of these conclusions (and more!) in more detail.” (lines 395-396)

      “However, looking at this exon alone does not give us a complete picture.” (lines 410-411)

      “...instead of with other ape/OWM sequences, …” (lines 413-414)

      “Figure 7 shows plausible steps that might have generated the current haplotypes and patterns of variation that we see in present-day primates. However, some species are poorly represented in the data, so the relationships between their genes and haplotypes are somewhat unclear.” (lines 427-429)

      “(and more-diverged)” (line 473)

      “(of both classes)” (line 476)

      “..., although the classes differ in their rate of evolution.”  (line 487-488)

      “Including these pseudogenes in our trees helped us construct a new model of $\alpha$-block haplotype evolution. “ (lines 517-518)

      (4) Line 480-82: "Notably...." why is this notable? Don't merely state that something is notable, explain what makes it especially worth drawing the reader's attention to: in what way is it particularly significant or surprising?

      We have changed the text from “Notably” to “In particular” (line 390) so that readers are expecting us to list some specific findings. Similarly, we changed “Notably” to “Specifically” (line 515).

      (5) The end of the discussion is weak: "provide context" is too vague and not a strong statement of something that we learned that we didn't know before, or its importance. This is followed by "This work will provide a jumping-off point for further exploration..." such as? What questions does this paper raise that merit further work?

      We have made this paragraph more specific and added some possible future research directions. It now reads “By treating the MHC genes as a gene family and including more data than ever before, this work enhances our understanding of the evolutionary history of this remarkable region. Our extensive set of trees incorporating classical genes, non-classical genes, pseudogenes, gene fragments, and alleles of medical interest across a wide range of species will provide context for future evolutionary, genomic, disease, and immunologic studies. For example, this work provides a jumping-off-point for further exploration of the evolutionary processes affecting different subsets of the gene family and the nuances of immune system function in different species. This study also provides a necessary framework for understanding the evolution of particular allelic lineages within specific MHC genes, which we explore further in our companion paper \citep{Fortier2025b}. Both studies shed light on MHC gene family evolutionary dynamics and bring us closer to understanding the evolutionary tradeoffs involved in MHC disease associations.” (lines 576-586)

      Reviewer #3 (Recommendations for the authors):

      (1) Figure 1 et seq. Classifying genes as having 'classical', 'non-classical' and 'dual' properties is notoriously difficult in non-model organisms due to the lack of relevant information. As you have characterised a number of genes for the first time in this paper and could not rely entirely on published classifications, please indicate the criteria you used for classification.

      The roles of these proteins are not necessarily conserved between closely related species, and experimental evidence is needed to evaluate this. However, in the absence of such evidence, wherever possible we have provided our best guess as to the roles of the orthologous genes in other species, presented in Figure 1 - source data 1 and Figure 2 - source data 1. This is based on whatever evidence is available at the moment, sometimes experimental but typically based on dN/dS ratios and other indirect measures.

      (2) Line 61 It's important to mention that classical MHC molecules present antigenic peptides to T cells with variable alphabeta T cell receptors, as non-classical MHC molecules may interact with other T cell subsets/types.

      Thank you for pointing this out; we have updated the text to make this clearer (lines 63-65). We changed “‘Classical’ MHC molecules perform antigen presentation to T cells---a key part of adaptive immunity---while ‘non-classical’ molecules have niche immune roles.” to “‘Classical’ MHC molecules perform antigen presentation to T cells with variable alphabeta TCRs---a key part of adaptive immunity---while ‘non-classical’ molecules have niche immune roles.”

      (3) Perhaps it's worth mentioning in the introduction that you are deliberately excluding highly divergent non-classical MHC molecules such as CD1.

      Thank you, it’s worth clarifying exactly what molecules we are discussing. We have added a sentence to the introduction (lines 38-43): “Having originated in the jawed vertebrates, this group of genes is now involved in diverse functions including lipid metabolism, iron uptake regulation, and immune system function (proteins such as zinc-𝛼2-glycoprotein (ZAG), human hemochromatosis protein (HFE), MHC class I chain–related proteins (MICA, MICB), and the CD1 family) \citep{Hansen2007,Kupfermann1999,Kaufman2022,Adams2013}. However, here we focus on…”

      (4) Line 94-105 This material presents results, it could be moved to the results section as it now somewhat disrupts the flow.

      We feel it is important to include a “teaser” of the results in the introduction, which can be slightly more detailed than that in the abstract.

      (5) Line 118-131 This opening section of the results sets the stage for the whole presentation and contains important information that I feel needs to be expanded to include an overview and justification of your methodological choices. As the M&M section is at the end of the MS (and contains limited justification), some information on two aspects is needed here for the benefit of the reader. First, as far as I understand, all phylogenetic inferences were based entirely on DNA sequences of individual (in some cases concatenated) exons. It would be useful for the reader to explain why you've chosen to rely on DNA rather than protein sequences, even though some of the genes you include in the phylogenetic analysis are highly divergent. Second, a reader might wonder how the "maximum clade credibility tree" from the Bayesian analysis compares to commonly seen trees with bootstrap support or posterior probability values assigned to particular clades. Personally, I think that the authors' approach to identifying and presenting representative trees is reasonable (although one might wonder why "Maximum clade credibility tree" and not "Maximum credibility tree" https://www.beast2.org/summarizing-posterior-trees/), since they are working with a large number of short, sometimes divergent and sometimes rather similar sequences - in such cases, a requirement for strict clade support could result in trees composed largely of polytomies. However, I feel it's necessary to be explicit about this and to acknowledge that the relationships represented by fully resolved bifurcating representative trees and interpreted in the study may not actually be highly supported in the sense that many readers might expect. In other words, the reader should be aware from the outset of what the phylogenies that are so central to the paper represent.

      We chose to rely on DNA rather than protein sequences because convergent evolution is likely to happen in regions that code for extremely important functions such as adaptive and innate immunity. Convergent evolution acts upon proteins while trans-species polymorphism retains ancient nucleotide variation, so studying the DNA sequence can help tease apart convergent evolution from trans-species polymorphism.

      As for the “maximum clade credibility tree”, this is a matter of confusing nomenclature. In the online reference guide (https://www.beast2.org/summarizing-posterior-trees/), the tree with the maximum product of the posterior clade probabilities is called the “maximum credibility tree” while the tree that has the maximum sum of posterior clade probabilities is called the “Maximum credibility tree”. The “Maximum credibility tree” (referring to the sum) appears to have only been named in this way in the first version of TreeAnnotator. However, the version of TreeAnnotator that I used lists the options “maximum clade credibility tree” and “maximum sum of clade probabilities”. So the context suggests that the “maximum clade credibility tree” option is actually maximizing the product. This “maximum clade credibility tree” is the setting I used for this project (in TreeAnnotator version 2.6.3).

      We agree that readers may not fully grasp what the collapsed trees represent upon first read. We have added a sentence to the beginning of the results (line 188-190) to make this more explicit.

      (6) Line 224, you're referring to the DPB1*09 lineage, not the DRB1*09 lineage.

      Indeed! We have changed these typos.

      (7) Line 409, why "Differences between MHC subfamilies" and not "Differences between MHC classes"?

      We chose the word “subfamilies” because we discuss the difference between classical and non-classical genes in addition to differences between Class I and Class II genes.

      (8) Line 529-544 This might work better as a table.

      We agree! This information is now presented as Table 1.

      (9) Line 547 MHC-DRB9 appears out of the blue here - please say why you are singling it out.

      Great point! We added a paragraph (lines 614-623) to explain why this was necessary.

      (10) Line 550-551 Even though you've screened the hits manually, it would be helpful to outline your criteria for this search.

      Thank you! We’ve added a couple of sentences to explain how we did this (lines 607-610).

      (11) Line 556-580 please provide nucleotide alignments as supplementary data so that the reader can get an idea of the actual divergence of the sequences that have been aligned together.

      Thank you! We’ve added nucleotide alignments as supplementary files.

      (12) Line 651-652 Why "Maximum clade credibility tree" and not "Maximum credibility tree"? 

      Repeat of (5). This is a matter of confusing nomenclature. In the online reference guide (https://www.beast2.org/summarizing-posterior-trees/), the tree with the maximum product of the posterior clade probabilities is called the “maximum credibility tree” while the tree that has the maximum sum of posterior clade probabilities is called the “Maximum credibility tree”. The “Maximum credibility tree” (referring to the sum) appears to have only been named in this way in the first version of TreeAnnotator. However, the version of TreeAnnotator that I used lists the options “maximum clade credibility tree” and “maximum sum of clade probabilities”. So the context suggests that the “maximum clade credibility tree” option is actually maximizing the product. This “maximum clade credibility tree” is the setting I used for this project (in TreeAnnotator version 2.6.3).

      (13) In the appendices, links to references do not work as expected.

      We will make sure these work properly when we receive the proofs.

    1. Note de synthèse : Les formes de la violence et le témoignage

      Ce document de synthèse explore les différentes formes et fonctions du témoignage face à la violence, en s'appuyant sur l'analyse de Didier Fassin dans "Les formes de la violence (8)".

      Il met en lumière l'importance de l'attestation de la violence, les diverses figures du témoin, les défis de sa représentation, et l'émergence de nouvelles médiations technologiques pour révéler la vérité.

      I. L'attestation de la violence : une urgence face à l'invisibilisation

      La raison d'être la plus commune de l'écriture et de la représentation de la violence est de l'attester, une urgence d'autant plus grande que la réalité est invisibilisée. L'auteur cite deux exemples contemporains de cette invisibilisation et des tentatives d'attestation :

      La violence coloniale française en Algérie : Malgré une loi de 2005 qui "oblige les programmes scolaires... à reconnaître le rôle positif de la présence française outre-mer", des travaux comme celui d'Alain Ruot (2024) dans "La première guerre en Algérie" rappellent les "spoliations de terre, les déplacements de population, les massacres de villageois, les enfumades de grottes, les centaines de milliers de morts surtout des civils" perpétrées par le corps expéditionnaire français.

      L'expulsion des Palestiniens (la Nakba) : L'expulsion de "750 000 Palestiniens, soit environ la moitié de la population arabe de ce territoire", qui a entraîné la "destruction de villages et dans certains cas du meurtre de leurs habitants", a longtemps été ignorée.

      Le film "Partition" (2025) de Dana Alan, prolongeant son ouvrage "Voices of the Nagba", vise à "restituer l'expérience de l'enagbactrale à travers les archives coloniales du mandat britannique" et les récits des Palestiniens.

      Ces entreprises visent à attester ce que les nations ont "enfoui souvent dans les profondeurs de l'oubli".

      Si les auteurs de violence peuvent avoir intérêt à la montrer pour "la jouissance de l'exercice de la force à la production d'un régime de terreur", ils ont souvent "un intérêt plus grand encore à la dissimuler, à la déguiser, à la nier" pour éviter la condamnation ou la sanction.

      Dans ces cas, il est crucial pour les victimes, leurs proches, et les "entrepreneurs de justice" (avocats, militants des droits humains, chercheurs) d'apporter la preuve de la violence, ses circonstances et ses responsables.

      "Attester la violence c'est donc combattre le déni, l'occultation, le mensonge, le révisionnisme historique. Attester la violence c'est emporter témoignage, c'est sans faire le témoin."

      II. Les figures du témoin : entre objectivité et subjectivité S'appuyant sur Émile Benveniste, l'auteur distingue deux conceptions du témoin, principalement à travers le latin :

      Testis : "celui qui assiste entière à une affaire où deux personnages sont intéressés ayant été présent au moment où les faits se sont produits".

      Sa parole "peut être utilisé pour trancher un litige à condition qu'il soit établi qu'il n'était pas lui-même partie prenante". Le testis est extérieur à la scène, son observation est présumée objective.

      Superstess : "décrit le témoin comme celui qui subsiste au-delà, témoin en même temps que survivant".

      Son témoignage est autorisé par le fait d'avoir "vécu lui-même les faits notamment lorsqu'il s'implique un danger ou une épreuve et d'avoir survécu à ce péril".

      Le superstess est la victime, son récit est nécessairement subjectif, mais non insoupçon.

      Cette distinction est mise à l'épreuve par la littérature sur la Shoah.

      A. Le défi du témoignage face à la dissimulation nazie

      L'histoire de l'extermination des Juifs et des Roms n'est pas quelque chose dont les nazis se vantaient, mais qu'ils ont cherché à dissimuler, y compris "vis-à-vis du peuple allemand et vis-à-vis d'eux-mêmes".

      Hannah Arendt, dans "Eichmann à Jérusalem", souligne l'usage d'un "langage codé" ou "règles de langage" qui étaient "dans le parler ordinaire... un mensonge", pour euphémiser les crimes : "solution finale", "traitement spécial", "évacuation".

      L'effet de ce système de langage n'était pas "d'empêcher les gens de savoir ce qu'ils faisaient, mais de les empêcher de mettre leurs actes en rapport avec leur ancienne notion normale du meurtre et du mensonge, en somme de rendre mentalement acceptable ce qui aurait pu leur paraître moralement intolérable."

      Pierre Vidal-Naquet ajoute que ce langage codé a facilité le négationnisme ultérieur.

      Les nazis, conscients de ce qui allait se passer, avertissaient cyniquement les prisonniers : "De quelque façon que cette guerre se finisse, nous l'avons déjà gagné contre vous ; aucun d'entre vous ne restera pour porter témoignage.

      Mais même si quelques-uns en réchappaient, le monde ne les croira pas, il n'y aura pas de certitude, car nous détruirons les preuves en vous détruisant." (Primo Levi, "Les naufragés et les rescapés").

      Cette peur du non-crédit a hanté les survivants, qui ont souvent raconté un cauchemar récurrent où leurs proches ne les croyaient pas.

      D'où l'importance vitale du témoignage, comme l'exprime Robert Antelme : "nous voulions parler, être entendu enfin".

      B. La complexité du témoignage des survivants (Superstess/Testis)

      Primo Levi, en écrivant "Si c'est un homme", cherchait à "attester" son expérience.

      Cependant, il exprime une profonde gêne, estimant que "nous les survivants ne sommes pas les vrais témoins... car nous sommes ceux qui grâce à la prévarication, l'habileté ou la chance, n'ont pas touché le fond."

      Les "musulmans" (ceux tellement affaiblis qu'ils étaient voués à mourir) sont les "témoins intégraux".

      La réflexion de Levi met à l'épreuve la distinction testis/superstess :

      • Il est un superstess incontestable, ayant survécu à l'impensable et décrivant l'insulte de la "démolition d'un homme".
      • Mais il est aussi un testis, conscient de ne jamais pouvoir restituer l'expérience de ceux qui ont été dévorés, et pour qui il parle "à leur place, par délégation".

      L'exemple d'Urbinec, l'enfant paralysé et mutique à Auschwitz, dont la "nécessité de parler jaillissait dans son regard avec une force explosive", et dont Primo Levi écrit "il témoigne à travers mes paroles", illustre cette réconciliation tragique des deux figures : "le superstès devenu testis sauve du néant la mémoire du petit garçon."

      C. Diversité des styles et temporalités du témoignage

      Les récits des survivants du génocide adoptent des styles et des temporalités variés :

      • Témoignage immédiat : David Rousset ("L'univers concentrationnaire", 1946) rencontre un succès rapide malgré la réticence des sociétés européennes, peut-être grâce à une "forme de recherche esthétique" créant une distance "qui neutralise les émotions".

      Son écriture est "austère et ironique", utilisant "des formules elliptiques et tranchantes, parfois caustiques et troublantes."

      • Témoignage différé : Charlotte Delbo ("Aucun de nous ne reviendra", 1965), écrit un premier brouillon après sa sortie, puis le reprend 20 ans plus tard. Elle commence par la scène collective des arrivées de trains, utilisant des phrases courtes et des images fortes pour dire "l'inconcevable".

      • Anti-mémoire : Imre Kertész ("Être et destin", 1985) adopte le regard "naïf déconcerté" d'un adolescent, décrivant la découverte progressive de l'horreur des camps, comme "l'odeur... doucâtre, en quelque sorte gluante" du crématorium.

      Il décrit la "détérioration physique" sans pathos, et même un "désir sourd" de vivre au moment du "tri final des mourants".

      • Méfiance et refus d'enfermement : Ruth Kluger ("Refus de témoigner. Une jeunesse", 1992) écrit pour exprimer sa méfiance face à la multiplication des témoignages et son refus d'être réduite à sa condition de déportée.

      • L'expérience des victimes du nazisme est à la fois "spécifique" (partir d'un vécu individuel) et "indéterminée" (nécessité de trouver les mots et la forme face à "l'incommunicabilité abyssale").

      Pour l'immense majorité des survivants, il faut "accepter de n'être ni superstès ni testice et donc se taire."

      III. Autres figures du témoin et médiations

      A. Auctor et Histor : l'autorité et la connaissance

      Auctor (latin) : "celui qui augmente la confiance, le garant, la source et donc l'autorité" et "celui qui pousse à agir, l'instigateur, le créateur et donc l'auteur".

      Le crédit est le fondement de son témoignage.

      Histor (grec) : "celui qui sait, qui connaît... l'historien". L'enquête est le fondement de son témoignage.

      Ces figures n'ont pas vécu les faits mais peuvent en être les garants. Les historiens contemporains "réunissent souvent les deux dimensions", bénéficiant du "crédit de leur discipline" et s'appuyant sur des "enquêtes menées dans des archives ou par des entretiens".

      L'exemple de Jean Hatzfeld et son livre "Dans le nu de la vie" (2000) sur le génocide rwandais illustre l'auctor.

      Il rassemble des récits de survivants, s'autorisant à les convaincre de parler malgré leur réticence.

      Journaliste et écrivain, il utilise sa double autorité pour "attester ce qu'a été et ce qu'est encore... l'expérience de ces hommes, de ces femmes, de ces enfants qui ont vécu le massacre."

      Bien que les récits soient rédigés à la première personne, ils sont "entièrement écrits par une troisième personne, l'auteur."

      • L'histore est illustré par les chercheurs en sciences sociales qui restituent et interprètent les faits en s'appuyant sur des "archives nationales ou étrangères, des jugements rendus par des juridictions internationales, des articles de journaux locaux, des entretiens avec des personnes occupant des positions différentes, des observations de procès".

      Les travaux de Mahmoud Mamdani ("When Victims Become Killers", 2001) interprètent le génocide rwandais à la lumière de l'histoire coloniale, distinguant le génocide conduit par les "settlers" (colons) et celui par les "natives" (indigènes).

      Hélène Dumas ("Le génocide au village", 2014) se concentre sur la "mécanique microlocale des violences", montrant que le génocide est "une affaire de voisins et de parents" et que les génocidaires "éprouvent une jouissance dans la souffrance et l'humiliation de leurs victimes."

      Beata Umubyeyi Mairesse ("Le convoi", 2024), une survivante du génocide rwandais, se distingue par sa réflexivité et son intégrité.

      Elle est à la fois superstess, racontant sa survie, et testis, décrivant ce qu'elle a vu.

      Elle se fait également historienne de son histoire, explorant des archives et conduisant des entretiens, mais "elle répugne à faire acte d'autorité," refusant d'être l'auctor.

      B. Martous : le témoin-martyr

      En grec ancien, "Martous" signifie le témoin, mais aussi, plus spécifiquement dans la Bible, le "témoin de Dieu", c'est-à-dire le martyr, celui qui "a accepté de mourir pour attester de sa croyance".

      Giorgio Agamben ("Ce qui reste d'Auschwitz", 1998) note que le martyre chrétien a dû "justifier le scandale d'une mort insensée".

      Le "shaï" arabe a un sens similaire, désignant à la fois le témoin et le martyr.

      En Palestine, la figure du shaïd s'est développée comme "ciment de l'unité nationale".

      Le shaïd peut être une victime tuée "sans l'avoir choisi" ou un combattant qui s'est exposé "volontairement pour la cause de son peuple".

      Ce dédoublement transforme le sens du martyre, l'étendant du "sacrifice librement consenti à la mort subie", et du "strictement religieux au politique".

      "Tout palestinien abattu ou exécuté par les Israéliens est un shaïd qui par sa mort dans un affrontement inégal atteste son appartenance à sa communauté et témoigne de la brutalisation de l'ennemi."

      Pour les martyrs palestiniens, le sacrifice ou la mort est une réponse à une "vie impossible à quoi la mort viendrait tragiquement redonner du sens".

      L'auteur cite la photojournaliste Fatima Assuna : "Quant à la mort qui est inévitable, si je meurs, je veux une mort retentissante, je ne veux pas être une simple brève dans un flash info ni un chiffre parmi d'autres, je veux une mort dont le monde entier entendra parler, une empreinte qui restera à jamais, des émotions, des images immortelles que ni le temps ni l'espace ne pourront enterrer."

      IV. Les médiations technologiques du témoignage

      Le témoignage ne s'exprime pas seulement par la parole, l'écrit ou le corps (dans le cas du martyr), mais aussi par des "médiations dans lesquelles les technologies peuvent être mobilisées".

      L'exemple le plus innovant est Forensic Architecture (fondée en 2010 par Eyal Weizman), une agence qui développe des "techniques, méthodes et concepts pour conduire des investigations sur la violence d'État et la violence en entreprise".

      • En combinant "l'imagerie spatiale par satellite, les caméras de surveillance, les enregistrements audio et vidéo, les témoignages individuels et collectifs", Forensic Architecture reconstitue en 3D des événements de violence qui ont été occultés.

      Parmi les nombreux cas étudiés, on trouve le génocide des Herero et Nama, les massacres israéliens pendant la Nakba, l'assassinat d'otages en Colombie, le meurtre de Mark Duggan au Royaume-Uni, l'utilisation d'armes européennes au Yémen, et des événements en France (Adama Traoré, Zineb Reddouane).

      Ces technologies permettent de "révéler de nombreuses violences, des crimes de guerre identifiés, des coupables reconnus, des versions officielles démenties, certaines vérités dites et la justice parfois rendue".

      Elles "renforcent, enrichissent et parfois même remplacent le témoignage humain".

      V. Conclusion : La complexité du témoignage pour faire exister la vérité

      En résumé, l'auteur a esquissé cinq figures idéaltypiques du témoin :

      • Le testis : présent au moment des faits, dont il peut raconter.
      • Le superstess : survivant, qui peut transmettre ce qu'il a vécu.
      • L'auctor : agent extérieur, qui apporte la crédibilité.
      • L'histor : expert légitime, qui conduit une enquête.
      • Le martous : victime sacrificielle, qui affirme la justesse de sa cause par son renoncement.

      • Chacune de ces figures "engage des formes politiques et morales : la véracité du testis, l'authenticité du superstès, l'autorité de l'actor, la neutralité de l'histor, l'engagement du Martus."

      Ces figures ne sont pas étanches et "se mêlent, se combinent, se déplacent, se complexifient" dans la réalité.

      Au-delà de ces distinctions, "l'enjeu du témoignage c'est de faire exister une vérité et notamment... de la faire exister contre la dissimulation, l'invisibilisation, la dénégation".

      C'est là toute l'importance de "celles et ceux qui ont pour projet de révéler la vérité ou tout au moins une part de la vérité à laquelle ils ont eu accès."

    1. eLife Assessment

      This study provides important insights into how researchers can use perceptual metamers to formally explore the limits of visual representations at different processing stages. The framework is compelling and the data largely support the claims, subject to minor caveats.

    2. Reviewer #1 (Public review):

      This is an interesting study on the nature of representations across the visual field. The question of how peripheral vision differs from foveal vision is a fascinating and important one. The majority of our visual field is extra-foveal yet our sensory and perceptual capabilities decline in pronounced and well-documented ways away from the fovea. Part of the decline is thought to be due to spatial averaging ('pooling') of features. Here, the authors contrast two models of such feature pooling with human judgments of image content. They use much larger visual stimuli than in most previous studies, and some sophisticated image synthesis methods to tease apart the prediction of the distinct models.

      More importantly, in so doing, the researchers thoroughly explore the general approach of probing visual representations through metamers-stimuli that are physically distinct but perceptually indistinguishable. The work is embedded within a rigorous and general mathematical framework for expressing equivalence classes of images and how visual representations influence these. They describe how image-computable models can be used to make predictions about metamers, which can then be compared to make inferences about the underlying sensory representations. The main merit of the work lies in providing a formal framework for reasoning about metamers and their implications, for comparing models of sensory processing in terms of the metamers that they predict, and for mapping such models onto physiology. Importantly, they also consider the limits of what can be inferred about sensory processing from metamers derived from different models.

      Overall, the work is of a very high standard and represents a significant advance over our current understanding of perceptual representations of image structure at different locations across the visual field. The authors do a good job of capturing the limits of their approach I particularly appreciated the detailed and thoughtful Discussion section and the suggestion to extend the metamer-based approach described in the MS with observer models. The work will have an impact on researchers studying many different aspects of visual function including texture perception, crowding, natural image statistics and the physiology of low- and mid-level vision.

      The main weaknesses of the original submission relate to the writing. A clearer motivation could have been provided for the specific models that they consider, and the text could have been written in a more didactic and easy to follow manner. The authors could also have been more explicit about the assumptions that they make.

      Comments following re-submission:

      Overall, I think the authors have done a satisfactory job of addressing most of the points I raised.

      There's one final issue which I think still needs better discussion.

      I think reviewer 2 articulated better than I have the point I was concerned about: the relationship between JNDs and metamers as depicted in the schematics and indeed in the whole conceptualization.

      I think the issue here is that there seems to be a conflating of two concepts- 'subthreshold' and 'metamer'-and I'm not convinced it is entirely unproblematic. It's true that two stimuli that cannot be discriminated from one another due to the physical differences being too small to detect reliably by the visual system are a form of metamer in the strict definition 'physically different, but perceptually the same'.<br /> However, I don't think this is the scientifically substantial notion of metamer that enabled insights into trichromacy. That form of metamerism is due to the principle of univariance in feature encoding, and involves conditions in which physically very different stimuli are mapped to one and the same point in sensory encoding space whether or not there is any noise in the system. When I say 'physically very different' I mean different by a large enough amount that they would be far above threshold, potentially orders of magnitude larger than a JND if the system's noise properties were identical but the system used a different sensory basis set to measure them. This seems to be a very different kind of 'physically different, but perceptually the same'.

      I do think the notion of metamerism can obviously be very usefully extended beyond photoreceptors and photon absorptions. In the interesting case of texture metamers, what I think is meant is that stimuli would be discriminable if scrutinised in the fovea, but because they have the same statistics they are treated as equivalent. I think the discussion of this could still be clearly articulated in the manuscript. It would benefit from a more thorough discussion of the difference between metamerism and subthreshold, especially in the context of the Voronoi diagrams at the beginning.

      It needs to be made clear to the reader why it is that two stimuli that are physically similar (e.g., just spanning one of the edges in the diagram) can be discriminable, while at the same time, two stimuli that are very different (e.g., at opposite ends of a cell) can't.

      Do the cells include BOTH those sets of stimuli that cannot be discriminated just because of internal noise AND those that can't be discriminated because they are projected to literally the same point in the sensory encoding space? What are the strengths and limits of models that involve the strict binarization of sensory representations, and how can they be integrated with models dealing with continuous differences? These seem like important background concepts that ought to be included in either the introduction of discussion sections. In this context it might also be helpful to refer to the notion of 'visual equivalence' as described by:

      Ramanarayanan, G., Ferwerda, J., Walter, B., & Bala, K. (2007). Visual equivalence: towards a new standard for image fidelity. ACM Transactions on Graphics (TOG), 26(3), 76-es.

      Other than that, I congratulate the authors on a very interesting study, and look forward to reading the final version.

    3. Reviewer #2 (Public review):

      Summary:

      The authors have improved clarity overall and have spoken to most of the issues raised by the reviewers. There are still two outstanding problems however, where issues raised during the review were inappropriately dismissed in the manuscript. These should be explicitly addressed as limitations to the results presented (no eye tracking), and early pilot experiments that informed the experiments as presented (pink noise) rather than brushed off as 'unnecessary' and 'would be uninformative'.

      Eye tracking:

      It is generally accepted that experiments testing stimuli presented at specific locations in peripheral vision require eye tracking to ensure that the stimulus is presented as expected, in particular, in the correct location. As I stated in the previous round of review, while a stimulus presentation time of 200ms does help eliminate some saccades, it does not eliminate the possibility that subjects were not fixating well during stimulus onset. I am also unclear what the authors mean by 'trained observer' in this context, though the authors state that an author subject in a different portion of the paper is an 'expert observer'. Does this mean the 'trained observers' are non-expert recruited subjects? Given the conditions tested differ from previous work (Freeman & Simoncelli, 2011) *these differences are a main contribution of the paper!* which DID include eye tracking in a subset of subjects, it is entirely possible to get similar results to this work in the context of non eye-tracking controlled stimulus presentation. The reasons now in the manuscript are not reasons that make eye tracking 'considered unnecessary'.

      I appreciate that the authors now state the lack of eye tracking explicitly, but believe the paper needs to at least state that this is a limitation of the results reported, and eyetracking being 'considered unnecessary' is unreasonable, nor a norm in this subfield.

      N=1: The authors now state clearly the limitations of a single subject in the manuscript, and state the expertise level of this subject.

      Large number of trials: The authors now address this and include an enumeration of the large number of trials.

      Simple Models / Physiology comparison: I support the choice to reduce claims regarding tight connections to physiology, and appreciate the explanation of the luminance model.

      Previous Work: I appreciate the author's changes to the introduction, both in discussing previous work and citation fixes.

      Blurred White, Pink Noise: While the authors now address pink noise, the explanation for such stimuli being expected to be uninformative is confusing to me. The manuscript now first states that pink noise is a natural choice, then claims it would be uninformative, while also stating in the rebuttal (not the manuscript) that they tried it and it indeed reduced the artifacts they note. The logic of the experiments indeed relies on finding the smallest critical scaling value, which is measured by subjects determining if a synthesis is similar or different to a target or second synth. A synthesis free from artifacts would surely affect the subjects responses and the smallest critical scaling measured.

      The statement that the authors experimented with pink noise early on and found this able to address the artifacts should be stated in the manuscript itself, not just in the rebuttal, and the blanket statement that this experiment would be 'uninformative' is incorrect. Surely this early pilot the authors mention in the rebuttal was informative to designing the experiments that appear in the final paper, and would be an informative experiment to include.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This is an interesting study of the nature of representations across the visual field. The question of how peripheral vision differs from foveal vision is a fascinating and important one. The majority of our visual field is extra-foveal yet our sensory and perceptual capabilities decline in pronounced and well-documented ways away from the fovea. Part of the decline is thought to be due to spatial averaging (’pooling’) of features. Here, the authors contrast two models of such feature pooling with human judgments of image content. They use much larger visual stimuli than in most previous studies, and some sophisticated image synthesis methods to tease apart the prediction of the distinct models.

      More importantly, in so doing, the researchers thoroughly explore the general approach of probing visual representations through metamers-stimuli that are physically distinct but perceptually indistinguishable. The work is embedded within a rigorous and general mathematical framework for expressing equivalence classes of images and how visual representations influence these. They describe how image-computable models can be used to make predictions about metamers, which can then be compared to make inferences about the underlying sensory representations. The main merit of the work lies in providing a formal framework for reasoning about metamers and their implications, for comparing models of sensory processing in terms of the metamers that they predict, and for mapping such models onto physiology. Importantly, they also consider the limits of what can be inferred about sensory processing from metamers derived from different models.

      Overall, the work is of a very high standard and represents a significant advance over our current understanding of perceptual representations of image structure at different locations across the visual field. The authors do a good job of capturing the limits of their approach and I particularly appreciated the detailed and thoughtful Discussion section and the suggestion to extend the metamer-based approach described in the MS with observer models. The work will have an impact on researchers studying many different aspects of visual function including texture perception, crowding, natural image statistics, and the physiology of low- and mid-level vision.

      The main weaknesses of the original submission relate to the writing. A clearer motivation could have been provided for the specific models that they consider, and the text could have been written in a more didactic and easy-to-follow manner. The authors could also have been more explicit about the assumptions that they make.

      Thank you for the summary. We appreciate the positives noted above. We address the weaknesses point by point below.

      Reviewer #2 (Public Review):

      Summary

      This paper expands on the literature on spatial metamers, evaluating different aspects of spatial metamers including the effect of different models and initialization conditions, as well as the relationship between metamers of the human visual system and metamers for a model. The authors conduct psychophysics experiments testing variations of metamer synthesis parameters including type of target image, scaling factor, and initialization parameters, and also compare two different metamer models (luminance vs energy). An additional contribution is doing this for a field of view larger than has been explored previously

      General Comments

      Overall, this paper addresses some important outstanding questions regarding comparing original to synthesized images in metamer experiments and begins to explore the effect of noise vs image seed on the resulting syntheses. While the paper tests some model classes that could be better motivated, and the results are not particularly groundbreaking, the contributions are convincing and undoubtedly important to the field. The paper includes an interesting Voronoi-like schematic of how to think about perceptual metamers, which I found helpful, but for which I do have some questions and suggestions. I also have some major concerns regarding incomplete psychophysical methodology including lack of eye-tracking, results inferred from a single subject, and a huge number of trials. I have only minor typographical criticisms and suggestions to improve clarity. The authors also use very good data reproducibility practices.

      Thank you for the summary. We appreciate the positives noted above. We address the weaknesses point by point below.

      Specific Comments

      Experimental Setup

      Firstly, the experiments do not appear to utilize an eye tracker to monitor fixation. Without eye tracking or another manipulation to ensure fixation, we cannot ensure the subjects were fixating the center of the image, and viewing the metamer as intended. While the short stimulus time (200ms) can help minimize eye movements, this does not guarantee that subjects began the trial with correct fixation, especially in such a long experiment. While Covid-19 did at one point limit in-person eye-tracked experiments, the paper reports no such restrictions that would have made the addition of eye-tracking impossible. While such a large-scale experiment may be difficult to repeat with the addition of eye tracking, the paper would be greatly improved with, at a minimum, an explanation as to why eye tracking was not included.

      Addressed on pg. 25, starting on line 658.

      Secondly, many of the comparisons later in the paper (Figures 9,10) are made from a single subject. N=1 is not typically accepted as sufficient to draw conclusions in such a psychophysics experiment. Again, if there were restrictions limiting this it should be discussed. Also (P11) Is subject sub-00 is this an author? Other expert? A naive subject? The subject’s expertise in viewing metamers will likely affect their performance.

      Addressed on pg. 14, starting on line 308.

      Finally, the number of trials per subject is quite large. 13,000 over 9 sessions is much larger than most human experiments in this area. The reason for this should be justified.

      In general, we needed a large number of trials to fit full psychometric functions for stimuli derived for both models, with both types of comparison, both initializations, and over many target images. We could have eliminated some of these, but feel that having a consistent dataset across all these conditions is a strength of the paper.

      In addition to the sentence on pg. 14, line 318, a full enumeration of trials is now described on pg. 23, starting on line 580.

      Model

      For the main experiment, the authors compare the results of two models: a ’luminance model’ that spatially pools mean luminance values, and an ’energy model’ that spatially pools energy calculated from a multi-scale pyramid decomposition. They show that these models create metamers that result in different thresholds for human performance, and therefore different critical scaling parameters, with the basic luminance pooling model producing a scaling factor 1/4 that of the energy model. While this is certain to be true, due to the luminance model being so much simpler, the motivation for the simple luminance-based model as a comparison is unclear.

      The use of simple models is now addressed on pg. 3, starting on line 98, as well as the sentence starting on pg. 4 line 148: the luminance model is intended as the simplest possible pooling model.

      The authors claim that this luminance model captures the response of retinal ganglion cells, often modeled as a center-surround operation (Rodieck, 1964). I am unclear in what aspect(s) the authors claim these center-surround neurons mimic a simple mean luminance, especially in the context of evidence supporting a much more complex role of RGCs in vision (Atick & Redlich, 1992). Why do the authors not compare the energy model to a model that captures center-surround responses instead? Do the authors mean to claim that the luminance model captures only the pooling aspects of an RGC model? This is particularly confusing as Figures 6 and 9 show the luminance and energy models for original vs synth aligning with the scaling of Midget and Parasol RGCs, respectively. These claims should be more clearly stated, and citations included to motivate this. Similarly, with the energy model, the physiological evidence is very loosely connected to the model discussed.

      We have removed the bars showing potential scaling values measured by electrophysiology in the primate visual system and attempted to clarify our language around the relationship between these models and physiology. Our metamer models are only loosely connected to the physiology, and we’ve decided in revision not to imply any direct connection between the model parameters and physiological measurements. The models should instead be understood as loosely inspired by physiology, but not as a tool to localize the representation (as was done in the Freeman paper).

      The physiological scaling values are still used as the mean of the priors on the critical scaling value for model fitting, as described on pg. 27, starting on line 698.

      Prior Work:

      While the explorations in this paper clearly have value, it does not present any particularly groundbreaking results, and those reported are consistent with previous literature.The explorations around critical eccentricity measurement have been done for texture models (Figure 11) in multiple papers (Freeman 2011, Wallis, 2019, Balas 2009). In particular, Freeman 20111 demonstrated that simpler models, representing measurements presumed to occur earlier in visual processing need smaller pooling regions to achieve metamerism. This work’s measurements for the simpler models tested here are consistent with those results, though the model details are different. In addition, Brown, 2023 (which is miscited) also used an extended field of view (though not as large as in this work). Both Brown 2023, and Wallis 2019 performed an exploration of the effect of the target image. Also, much of the more recent previous work uses color images, while the author’s exploration is only done for greyscale.

      We were pleased to find consistency of our results with previous studies, given the (many) differences in stimuli and experimental conditions (especially viewing angle), while also extending to new results with the luminance model, and the effects of initialization. Note that only one of the previous studies (Freeman and Simoncelli, 2011) used a pooled spectral energy model. Moreover, of the previous studies, only one (Brown et al., 2023) used color images (we have corrected that citation - thanks for catching the error).

      Discussion of Prior Work:

      The prior work on testing metamerism between original vs. synthesized and synthesized vs. synthesized images is presented in a misleading way. Wallis et al.’s prior work on this should not be a minor remark in the post-experiment discussion. Rather, it was surely a motivation for the experiment. The text should make this clear; a discussion of Wallis et al. should appear at the start of that section. The authors similarly cite much of the most relevant literature in this area as a minor remark at the end of the introduction (P3L72).

      The large differences we observed between comparison types (original vs synthesized, compared to synthesized vs synthesized) surprised us. Understanding such difference was not a primary motivation for the work, but it is certainly an important component of our results. In the introduction, we thought it best to lay out the basic logic of the metamer paradigm for foveated vision before mentioning the complications that are introduced in both the Wallis and Brown papers (paragraph beginning p. 3, line 109). Our results confirm and bolster the results of both of those earlier works, which are now discussed more fully in the Introduction (lines 109 and following).

      White Noise: The authors make an analogy to the inability of humans to distinguish samples of white noise. It is unclear however that human difficulty distinguishing samples of white noise is a perceptual issue- It could instead perhaps be due to cognitive/memory limitations. If one concentrates on an individual patch one can usually tell apart two samples. Support for these difficulties emerging from perceptual limitations, or a discussion of the possibility of these limitations being more cognitive should be discussed, or a different analogy employed.

      We now note the possibility of cognitive limits on pg. 8, starting on line 243, as well as pg. 22, line 571. The ability of observers to distinguish samples of white noise is highly dependent on display conditions. A small patch of noise (i.e., large pixels, not too many) can be distinguished, but a larger patch cannot, especially when presented in the periphery. This is more generally true for textures (as shown in Ziemba and Simoncelli (2021)). Samples of white noise at the resolution used in our study are indistinguishable.

      Relatedly, in Figure 14, the authors do not explain why the white noise seeds would be more likely to produce syntheses that end up in different human equivalence classes.

      In figure 14, we claim that white noise seeds are more likely to end up in the same human equivalence classes than natural image seeds. The explanation as to why we think this may be the case is now addressed on pg. 19, starting on line 423.

      It would be nice to see the effect of pink noise seeds, which mirror the power spectrum of natural images, but do not contain the same structure as natural images - this may address the artifacts noted in Figure 9b.

      The lack of pink noise seeds is now addressed on pg. 19, starting on line 429.

      Finally, the authors note high-frequency artifacts in Figure 4 & P5L135, that remain after syntheses from the luminance model. They hypothesize that this is due to a lack of constraints on frequencies above that defined by the pooling region size. Could these be addressed with a white noise image seed that is pre-blurred with a low pass filter removing the frequencies above the spatial frequency constrained at the given eccentricity?

      The explanation for this is similar to the lack of pink noise seeds in the previous point: the goal of metamer synthesis is model testing, and so for a given model, we want to find model metamers that result in the smallest possible critical scaling value. Taking white noise seed images and blurring them will almost certainly remove the high frequencies visible in luminance metamers in figure 4 and thus result in a larger critical scaling value, as the reviewer points out. However, the logic of the experiments requires finding the smallest critical scaling value, and so these model metamers would be uninformative. In an early stage of the project, we did indeed synthesize model metamers using pink noise seeds, and observed that the high frequency artifacts were less prominent.

      Schematic of metamerism: Figures 1,2,12, and 13 show a visual schematic of the state space of images, and their relationship to both model and human metamers. This is depicted as a Voronoi diagram, with individual images near the center of each shape, and other images that fall at different locations within the same cell producing the same human visual system response. I felt this conceptualization was helpful. However, implicitly it seems to make a distinction between metamerism and JND (just noticeable difference). I felt this would be better made explicit. In the case of JND, neighboring points, despite having different visual system responses, might not be distinguishable to a human observer.

      Thanks for noting this – in general, metamers are subthreshold, and for the purpose of the diagram, we had to discretize the space showing metameric regions (Voronoi regions) around a set of stimuli. We’ve rewritten the captions to explain this better. We address the binary subthreshold nature of the metamer paradigm in the discussion section (pg. 19, line 438).

      In these diagrams and throughout the paper, the phrase ’visual stimulus’ rather than ’image’ would improve clarity, because the location of the stimulus in relation to the fovea matters whereas the image can be interpreted as the pixels displayed on the computer.

      We agree and have tried to make this change, describing this choice on pg. 3 line 73.

      Other

      The authors show good reproducibility practices with links to relevant code, datasets, and figures.

      Reviewer #1 (Recommendations For The Authors):

      In its current form, I found the introduction to be too cursory. I felt that the article would benefit from a clearer motivation for the two models that are considered as the reader is left unclear why these particular models are of special scientific significance. The luminance model is intended to capture some aspects of retinal ganglion cells response characteristics and the spectral energy model is intended to capture some aspects of the primary visual cortex. However, one can easily imagine models that include the pooling of other kinds of features, and it would be helpful to get an idea of why these are not considered. Which aspects of processing in the retina and V1 are being considered and which are being left out, and why? Why not consider representations that capture even higher-order statistical structure than those covered by the spectral energy model (or even semantics)? I think a bit of rewriting with this in mind could improve the introduction.

      Along similar lines, I would have appreciated having the logic of the study explained more explicitly and didactically: which overarching research question is being asked, how it is operationalised in the models and experiments, and what are the predictions of the different models. Figures 2 and 3 are certainly helpful, but I felt further explanations would have made it easier for the reader to follow. Throughout, the writing could be improved by a careful re-reading with a view to making it easier to understand. For example, where results are presented, a sentence or two expanding on the implications would be helpful.

      I think the authors could also be more explicit about the assumptions they make. While these are obviously (tacitly) included in the description of the models themselves, it would be helpful to state them more openly. To give one example, when introducing the notion of critical scaling, on p.6 the authors state as if it is a self-evident fact that "metamers can be achieved with windows whose size is matched to that of the underlying visual neurons". This presumably is true only under particular conditions, or when specific assumptions about readout from populations of neurons are invoked. It would be good to identify and state such assumptions more directly (this is partly covered in the Discussion section ’The linking proposition underlying the metamer paradigm’, but this should be anticipated or moved earlier in the text).

      We agree that our introduction was too cursory and have reworked it. We have also backed off of the direct comparison to physiology and clarified that we chose these two as the simplest possible pooling models. We have also added sentences at the end of each result section attempting to summarize the implication (before discussing them fully in the discussion). Hopefully the logic and assumptions are now clearer.

      There are also some findings that warrant a more extensive discussion. For example, what is the broader implication of the finding that original vs. synthesised and synthesised vs. synthesised comparisons exhibit very different scaling values? Does this tell us something about internal visual representations, or is it simply capturing something about the stimuli?

      We believe this difference is a result of the stimuli that are used in the experiment and thus the synthesis procedure itself, which interacts with the model’s pooled image feature. We have attempted to update the relevant figures and discussions to clarify this, in the sections starting on pg 17 line 396 and pg. 19 line 417.

      At some points in the paper, a third model (’texture model’) creeps into the discussion, without much explanation. I assume that this refers to models that consider joint (rather than marginal) statistics of wavelet responses, as in the famous Portilla & Simoncelli texture model. However, it would be helpful to the reader if the authors could explain this.

      Addressed on pg. 3, starting on line 94.

      Minor corrections.

      Caption of Figure 3: ’top’ and ’bottom’ should be ’left’ and ’right’

      Line 177: ’smallest tested scaling values tested’. Remove one instance of ’tested’

      Line 212: ’the images-specific psychometric functions’ -> ’image-specific’

      Line 215: ’cloud-like pink noise’. It’s not literally pink noise, so I would drop this.

      Line 236: ’Importantly, these results cannot be predicted from the model, which gives no specific insight as to why some pairs are more discriminable than others’. The authors should specify what we do learn from the model if it fails to provide insight into why some image pairs are more discriminable than others.

      Figure 9: it might be helpful to include small insets with the ’highway’ and ’tiles’ source images to aid the reader in understanding how the images in 9B were generated.

      Table 1 placement should be after it is first referred to on line 258.

      In the Discussion section "Why does critical scaling depend on the comparison being performed", it would be helpful to consider the case where the two model metamers *are* distinguishable from each other even though each is indistinguishable from the target image. I would assume that this is possible (e.g., if the target image is at the midpoint between the two model images in image space and each of the stimuli is just below 1 JND away from the target). Or is this not possible for some reason?

      Regarding line 236: this specific line has been removed, and the discussion about this issue has all been consolidated in the final section of the discussion, starting on pg. 19 line 438.

      Regarding the final comment: this is addressed in the paragraph starting on pg. 16 line 386. To expand upon that: the situation laid out by the reviewer is not possible in our conceptualization, in which metamerism is transitive and image discriminability is binary. In order to investigate situations like the one laid out by the reviewer, one needs models whose representations have metric properties, i.e., which allow you to measure and reason about perceptual distance, which we refer to in the paragraph starting on pg. 20 line 460. We also note that this situation has not been observed in this or any other pooling model metamer study that we are aware of. All other minor changes have been addressed.

      Reviewer #2 (Recommendations For The Authors):

      Original image T should be marked in the Voronoi diagrams.

      Brown et al is miscited as 2021 should be ACM Transactions on Applied Perception 2023.

      Figure 3 caption: models are left and right, not top and bottom.

      Thanks, all of the above have been addressed.

      References

      BrownReral Encoding, in the Human Visual System. ACM Transactions on Applied Perception. 2023 Jan; 20(1):1–22.http://dx.doi.org/10.1145/356460, Dutell V, Walter B, Rosenholtz R, Shirley P, McGuire M, Luebke D. Efficient Dataflow Modeling of Periph-5, doi: 10.1145/3564605.

      Freeman Jdoi: 10.1038/nn.2889, Simoncelli EP. Metamers of the ventral stream. Nature Neuroscience. 2011 aug; 14(9):1195–1201..

      Ziemba CMnications. 2021 jul; 12(1)., Simoncelli EP. Opposing Effects of Selectivity and Invariance in Peripheral Vision. Nature Commu-https://doi.org/10.1038/s41467-021-24880-5, doi: 10.1038/s41467-021-24880-5.

    1. 我们都知道 TCP 连接建立是需要三次握手

      同一个端口的两个 socket:默认不能同时绑定监听,因此无法各自建立连接(特殊配置下的复用不改变 “四元组唯一” 的核心规则)。 端口的作用是 “区分服务”,而连接的唯一性由 “四元组” 保证,因此多个端口自然支持更多独立连接,这也是服务器通常会开放多个端口(如 80、443、3306 等)提供不同服务的原因。

    1. eLife Assessment

      This study presents a valuable finding relating to how the state of arousal is represented within the superior colliculus, a principal visuo-oculomotor structure. The main conclusion that the representation of arousal is segregated, and thus influences visual activity but not motor output, is incompletely supported by the evidence, but could be stronger if a specific concern relating to an alternative explanation for the dichotomy was addressed. The work will be of interest to sensory, motor, and cognitive neuroscientists.

    2. Reviewer #1 (Public review):

      Summary:

      Johnston and Smith used linear electrode arrays to record from small populations of neurons in the superior colliculus (SC) of monkeys performing a memory-guided saccade (MGS) task. Dimensionality reduction (PCA) was used to reveal low-dimensional subspaces of population activity reflecting the slow drift of neuronal signals during the delay period across a recording session (similar to what they reported for parts of cortex: Cowley et al., 2020). This SC drift was correlated with a similar slow-drift subspace recorded from the prefrontal cortex, and both slow-drift subspaces tended to be associated with changes in arousal (pupil size). These relationships were driven primarily by neurons in superficial layers of the SC, where saccade sensitivity/selectivity is typically reduced. Accordingly, delay-period modulations of both spiking activity and pupil size were independent of saccade-related activity, which was most prevalent in deeper layers of the SC. The authors suggest that these findings provide evidence of a separation of arousal- and motor-related signals. The analysis techniques expand upon the group's previous work and provides useful insight into the power of large-scale neural recordings paired with dimensionality reduction. This is particularly important with the advent of recording technologies which allow for the measurement of spiking activity across hundreds of neurons simultaneously. Together, these results provide a useful framework for comparing how different populations encode signals related to cognition, arousal, and motor output in potentially different subspaces.

      Comments on revised manuscript:

      The authors have done a very good job of responding to all of the reviewers' concerns.

    3. Reviewer #2 (Public review):

      Summary:

      Neurons in motor-related areas have increasingly shown to carry also other, non-motoric signals. This creates a problem of avoidance of interference between the motor and non-motor-related signals. This is a significant problem that likely affects many brain areas. The specific example studied here is interference between saccade-related activity and slow-changing arousal signals in the superior colliculus. The authors identify neuronal activity related to saccades and arousal. Identifying saccade-related activity is straightforward, but arousal-related activity is harder to identify. The authors first identify a potential neuronal correlate of arousal using PCA to identifying a component in the population activity corresponding to slow drift over the recording session. Next, they link this component to arousal by showing that the component is present across different brain areas (SC and PFC), and that it is correlated with pupil size, an external marker of arousal. Having identified an arousal-related component in SC, the authors show next that SC neurons with strong motor-related activity are less strongly affected by this arousal component (both SC and PFC). Lastly, they show that SC population activity pattern related to saccades and pupil size form orthogonal subspaces in the SC population.

      Strengths:

      A great strength of this research is the clear description of the problem, its relationship with the performed analysis and the interpretation of the results. the paper is very well written and easy to follow. An additional strength is the use of fairly sophisticated analysis using population activity.

      Weaknesses:

      (1) The greatest weakness in the present research is the fact that arousal is a functionally less important non-motoric variable. The authors themself introduce the problem with a discussion of attention, which is without any doubt the most important cognitive process that needs to be functionally isolated from oculomotor processes. Given this introduction, one cannot help but wonder, why the authors did not design an experiment, in which spatial attention and oculomotor control are differentiated. Absent such an experiment, the authors should spend more time on explaining the importance of arousal and how it could interfere with oculomotor behavior.

      (2) In this context, it is particularly puzzling that one actually would expect effects of arousal on oculomotor behavior. Specifically, saccade reaction time, accuracy, and speed could be influenced by arousal. The authors should include an analysis of such effects. They should also discuss the absence or presence of such effects and how they affect their other results.

      (3) The authors use the analysis shown in Figure 6D to argue that across recording sessions the activity components capturing variance in pupil size and saccade tuning are uncorrelated. however, the distribution (green) seems to be non-uniform with a peak at very low and very high correlation specifically. The authors should test if such an interpretation is correct. If yes, where are the low and high correlations respectively? Are there potentially two functional areas in SC?

      Comments on revised manuscript:

      I remain somewhat concerned that the authors jump immediately into an analysis of the 'arousal-related' effects on SC activity. Before that, I would like to see a more detailed discussion justifying the use pupil size alone (i.e., w/o other indicators such as RT) as indicative of fluctuations in general arousal that are causal to concomitant changes in SC activity. Instead, in its current form, the authors find changes in SC activity and describe them immediately as 'arousal-related'.

      Other than this conceptual issue, I do not have major problems with the analysis per se.

    4. Reviewer #3 (Public review):

      Summary:

      This study looked at slow changes in neuronal activity (on the order of minutes to hours) in the superior colliculus (SC) and prefrontal cortex (PFC) of two monkeys. They found that SC activity shows slow drift in neuronal activity like in the cortex. They then computed a motor index in SC neurons. By definition, this index is low if the neuron has stronger visual responses than motor response, and it is low if the neuron has weaker visual responses and stronger motor responses. The authors found that the slow drift in neuronal activity was more prevalent in the low motor index SC neurons and less prevalent in the high motor index neurons. In addition, the authors measured pupil diameter and found it to correlate with slow drifts in neuronal activity, but only in the neurons with lower motor index of the SC. They concluded that arousal signals affecting slow drifts in neuronal modulations are brain-wide. They also concluded that these signals are not present in the deepest SC layers, and they interpreted this to mean that this minimizes the impact of arousal on unwanted eye movements.

      Strengths:

      The paper is clear and well-written.

      Showing slow drifts in the SC activity is important to demonstrate that cortical slow drifts could be brain-wide.

      Weaknesses:

      The authors find that the SC cells with the low motor index are modulated by pupil diameter. However, this could be completely independent of an "arousal signal". These cells have substantial visual sensitivity. If the pupil diameter changes, then their activity should be influenced since the monkey is watching a luminous display. So, in this regard, the fact that they do not see "an arousal signal" in the most motor neurons (through the pupil diameter analyses) is not evidence that the arousal signal is filtered out from the motor neurons. It could simply be that these neurons simply do not get affected by the pupil diameter because they do not have visual sensitivity. So, even with the pupil data, it is still a bit tricky for me to interpret that arousal signals are excluded from the "output layers" of the SC.

      Of course, the general conclusion is that the motor neurons will not have the arousal signal. It's just the interpretation that is different in the sense that the lack of the arousal signal is due to a lack of visual sensitivity in the motor neurons.

      I think that it is important to consider the alternative caveat of different amounts of light entering the system. Changes in light level caused by pupil diameter variations can be quite large. Please also note that I do not mean the luminance transient associated with the target onset. I mean the luminance of the gray display. it is a source of light. if the pupil diameter changes, then the amount of light entering to the visually sensitive neurons also changes.

      Comments on revised manuscript:

      The authors have addressed my first primary comment. For the light comment, I'm still not sure they addressed it. At the very least, they should explicitly state the possibility that the amount of light entering from the gray background can matter greatly, and it is not resolved by simply changing the analysis interval to the baseline pre-stimulus epoch. I provide more clear details below:

      In line 194 of the redlined version of the article (in the Introduction), the citation to Baumann et al., PNAS, 2023 is missing near the citation of Jagadisan and Gandhi, 2022. Besides replicating Jagadisan and Gandhi, 2022, this other study actually showed that the subspaces for the visual and motor epochs are orthogonal to each other

      Line 683 (and around) of the redlined version of the article (in the Results): I'm very confused here. When I mentioned visual modulation by changed pupil diameter, I did not mean the transient changes associated with the brief onset of the cue in the memory-guided saccade task. I meant the gray background of the display itself. This is a strong source of light. If the pupil diameter changes across trials, then the amount of light entering the eye also changes from the gray background. Thus, visually-responsive neurons will have different amount of light driving them. This will also happen in the baseline interval containing only a fixation spot. The arguments made by the authors here do not address this point at all. So, please modify the text to explicitly state the possibility that the global luminance of the display (as filtered by the pupil diameter) alters the amount of light driving the visually-responsive neurons and could contribute to the higher effects seen in the more visual neurons.

      The figures (everywhere, including the responses to reviewers) are very low resolution and all equations in methods are missing.

      I'm very confused by Fig. 2 - supplement 2. Panel B shows a firing rate burst aligned to *microsaccade* onset. Does that mean you were in the foveal SC? i.e. how can neurons have a motor burst to the target of the memory-guided saccade and also for microsaccades? And which microsaccade directions caused such a burst? And what does it mean to compute the motor index and spike count for microsaccades in panel C? if you were in the proper SC location for the saccade target, then shouldn't you *not* get any microsaccade-related burst at all? This is very confusing to me and needs to be clarified

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) The authors make fairly strong claims that "arousal-related fluctuations are isolated from neurons in the deep layers of the SC" (emphasis added). This conclusion is based on comparisons between a "slow drift axis", a low-dimensional representation of neuronal drift, and other measures of arousal (Figures 2C, 3) and motor output sensitivity (Figures 2B, 3B). However, the metrics used to compare the slow-drift axis and motor activity were computed during separate task epochs: the delay period (600-1100 ms) and a perisaccade epoch (25 ms before and after saccade initiation), respectively. As the authors reference, deep-layer SC neurons are typically active only around the time of a saccade. Therefore, it is not clear if the lack of arousal-related modulations reported for deep-layer SC neurons is because those neurons are truly insensitive to those modulations, or if the modulations were not apparent because they were assessed in an epoch in which the neurons were not active. A potentially more valuable comparison would be to calculate a slow-drift axis aligned to saccade onset. 

      The reviewer makes an important point that the calculation of an axis can depend critically on the time window of neuronal response. We find when considering this that the slow drift axis is less sensitive to this issue because it is calculated on time-averaged activity over multiple trials. In previous work we found that slow drift calculated on the stimulus evoked response in V4 was very well aligned to slow drift calculated on pre-stimulus spontaneous activity (Cowley et al, Neuron, 2020, Supplemental Figure 3A and 3B). To address this issue in the present data, we compared the axis computed for an example session for neural activity during the delay period and neural activity aligned to saccade onset. As shown new Figure 2 – figure supplement 1 in the revised manuscript, we found a similar lack of arousal-related modulations for deep-layer SC neurons when slow drift was computed using the saccade epoch (25ms before to 25ms after the onset of the saccade). Figure 2 – figure supplement 1A shows loadings for the SC slow drift axis when it was computed using spiking responses during the delay period (as in the main manuscript analysis). In contrast, Figure 2 – figure supplement 1B shows loadings from the same session when the SC slow drift axis was computed using spiking responses during the saccade epoch. The plots are highly similar and in both cases the loadings were weaker for neurons recorded from channels at the bottom of the probe which have a higher motor index. Finally, we found that projections onto the SC slow drift axis for this session were strongly correlated when the slow drift axis was computed using spiking responses during the delay period and the saccade epoch (r = 0.66, p < 0.001, Figure 1C). Taken together, these results suggest that arousal-related modulations are less evident in deep-layer SC neurons irrespective of whether slow drift was computed during the delay or saccade epoch (see also Public Reviews, Reviewer 1, Point 2).

      (2) More generally, arousal-related signals may persist throughout multiple different epochs of the task. It would be worthwhile to determine whether similar "slow-drift" dynamics are observed for baseline, sensory-evoked, and saccade-related activity. Although it may not be possible to examine pupil responses during a saccade, there may be systematic relationships between baseline and evoked responses. 

      Similar to the point above, slow drift dynamics tend to be similar across different response epochs because they are averaged across many trials and seem to tap into responsivity trends that are robust across epochs. As shown in Author response image 1 below, and the Figure 2 – figure supplement 1 in the revised manuscript, similar dynamics were observed when the SC slow drift axis was computed using spiking responses during the baseline, delay, visual and saccade epochs. We did not investigate differences between baseline and evoked pupil responses in the current paper. However, these effects were characterized in one of our previous papers that focused exclusively on the relationship between slow drift and eye-related metrics (Johnston et al., 2022, Cereb. Cortex, Figure 6). In this previous work, we found a negative correlation between baseline and evoked pupil size. Both variables were significantly correlated with slow drift, the only difference being the sign of the correlation.

      Author response image 1.

      (A-C) Dynamics of slow drift for three example sessions when the SC slow drift axis was computed using spiking responses during the baseline, delay, visual and saccade epochs. Baseline = 100ms before the onset of the target stimulus; Delay = 600 to 1100ms after the offset of the target stimulus; Stim = 25ms to 125ms after the onset of the target stimulus; Sac = 25ms before to 25ms after the onset of the saccade.

      Johnston R, Snyder AC, Khanna SB, Issar D, Smith MA (2022) The eyes reflect an internal cognitive state hidden in the population activity of cortical neurons. Cereb Cortex 32:3331–3346.

      (3) The relationships between changes in SC activity and pupil size are quite small (Figures 2C & 5C). Although the distribution across sessions (Figure 2C) is greater than chance, they are nearly 1/4 of the size compared to the PFC-SC axis comparisons. Likewise, the distribution of r2 values relating pupil size and spiking activity directly (Figure 5) is quite low. We remain skeptical that these drifts are truly due to arousal and cannot be accounted for by other factors. For example, does the relationship persist if accounting for a very simple, monotonic (e.g., linear) drift in pupil size and overall firing rate over the course of an individual session? 

      Firstly, it is important to note that the strength of the relationship between projections onto the SC slow drift axis and pupil size (r<sup>2</sup> = 0.06) is within the range reported by Joshi et al. (2016, Neuron, Figure 3). They investigated the median variance explained between the spiking responses of individual SC neurons and pupil size and found it to be approximately 0.02 across sessions. Secondly, our statistical approach of testing the actual distribution of r<sup>2</sup> values against a shuffled distribution was specifically designed to rule out the possibility that the relationship between SC spiking responses and pupil size occurred due to linear drifts. The shuffled distribution in Figure 2C of the main manuscript represents the variance that can be explained by one session’s slow drift correlated with another session’s pupil, which would contain effects that occurred due to linear drifts alone. That the actual proportion of variance explained was significantly greater than this distribution suggests that the relationship between projections onto the SC slow drift axis and pupil size reflects changes in arousal rather than other factors related to linear drifts.

      Joshi S, Li Y, Kalwani RM, Gold JI (2016) Relationships between Pupil Diameter and Neuronal Activity in the Locus Coeruleus, Colliculi, and Cingulate Cortex. Neuron 89:221–234.

      (4) It is not clear how the final analysis (Figure 6) contributes to the authors' conclusions. The authors perform PCA on: (i) residual spiking responses during the delay period binned according to pupil size, and (ii) spiking responses in the saccade epoch binned according to target location (i.e., the saccade tuning curve). The corresponding PCs are the spike-pupil axis and the saccade tuning axis, respectively. Unsurprisingly, the spikepupil axis that captures variance associated with arousal (and removes variance associated with saccade direction) was not correlated with a saccade-tuning axis that captures variance associated with saccade direction and omits arousal. Had these measures been related it would imply a unique association between a neuron's preferred saccade direction and pupil control- which seems unlikely. The separation of these axes thus seems trivial and does not provide evidence of a "mechanism...in the SC to prevent arousal-related signals interfering with the motor output." It remains unknown whether, for example, arousal-related signals may impact trial-by-trial changes in neuronal gain near the time of a saccade, or alter saccade dynamics such as acceleration, precision, and reaction time. 

      The reviewer makes a good point, and we agree that more evidence is needed to determine if the separation of the pupil size axis and saccade tuning axis is the mechanism through which cognitive and arousal-related signals can be intermixed in the SC. In the revised manuscript (lines 679-682), we have raised this as a possible explanation that necessitates further study rather than stating definitively that it is the exact mechanism through which these signals are kept separate. Our analysis here is similar to the one from Smoulder et al (2024, Neuron, Fig. 2F), in which the interactions between reward signals and target tuning in M1 were examined (and found to be orthogonal). While we agree with the reviewer that it may seem “trivial” for these axes to be orthogonal, it does not have to be so. If, for example, neural tuning curves shifted with changes in pupil size through gain changes that revealed tuning or affected tuning curve shape, there could be projections of the pupil axis onto the target tuning axis. Thus, while we agree with the reviewer that it appears sensible for these two axes to be orthogonal, our result is nonetheless a novel finding. We have edited the text in our revised manuscript, however, to make sure the nuance of this point is conveyed to the reader.

      Smoulder AL, Marino PJ, Oby ER, Snyder SE, Miyata H, Pavlovsky NP, Bishop WE, Yu BM, Chase SM, Batista AP. A neural basis of choking under pressure. Neuron. 2024 Oct 23;112(20):3424-33.

      Reviewer #2 (Public Review):

      (1) The greatest weakness in the present research is the fact that arousal is a functionally less important non-motoric variable. The authors themselves introduce the problem with a discussion of attention, which is without any doubt the most important cognitive process that needs to be functionally isolated from oculomotor processes. Given this introduction, one cannot help but wonder, why the authors did not design an experiment, in which spatial attention and oculomotor control are differentiated. Absent such an experiment, the authors should spend more time explaining the importance of arousal and how it could interfere with oculomotor behavior. 

      Although attention does represent an important cognitive process, we did not design an experiment in which attention and oculomotor control are differentiated because attention does not appear to be related to slow drift. In our first paper that reported on this phenomenon, we investigated the effects of spatial attention on slow fluctuations in neural activity by cueing the monkeys to attend to a stimulus in the left or right visual field in a block-wise manner. Each block lasted ~20 minutes and we found that slow drift did not covary with the timing of cued blocks (see Figure 4A, Cowley et al., 2020, Neuron). Furthermore, there is a large body of work showing that arousal also impacts motor behavior leading to changes in a range of eye-related metrics (e.g., pupil size, microsaccade rate and saccadic reaction time - for review, see Di Stasi et al. 2013, Neurosci. Biobehav. Rev.). We also note that the terms attention and arousal are often used in nonspecific and overlapping ways in the literature, adding to some potential confusion here. Nonetheless, pupil-linked arousal is an important variable that impacts motor performance. This has now been stated clearly in the Introduction of the revised manuscript (lines 108-114) to address the reviewer’s concerns and highlight the importance of studying how precise fixation and eye movements are maintained even in the presence of signals related to ongoing changes in brain state. 

      Cowley BR, Snyder AC, Acar K, Williamson RC, Yu BM, Smith MA (2020) Slow Drift of Neural Activity as a Signature of Impulsivity in Macaque Visual and Prefrontal Cortex. Neuron 108:551-567.e8.

      (2) In this context, it is particularly puzzling that one actually would expect effects of arousal on oculomotor behavior. Specifically, saccade reaction time, accuracy, and speed could be influenced by arousal. The authors should include an analysis of such effects. They should also discuss the absence or presence of such effects and how they affect their other results. 

      As described above, several studies across species have demonstrated that arousal impacts motor behavior e.g., saccade reaction time, saccade velocity and microsaccade rate (for review, see Di Stasi et al. 2013, Neurosci. Biobehav. Rev.). This has been clarified in the Introduction of the revised manuscript to address the reviewer's concerns (lines 108-114). Our prior work (Johnston et al, Cerebral Cortex, 2022) shows that slow drift impacts several types of oculomotor behavior. Overall, these studies highlight the impact of arousal on eye movements as a robust effect, and support the present investigation into arousal and oculomotor control signals. While we agree reaction time, accuracy, and speed all can be influenced by arousal depending on task demands, the present study is focused on the connection between slow fluctuations in neural activity, linked to arousal, and different subpopulations of SC neurons. 

      Di Stasi LL, Catena A, Cañas JJ, Macknik SL, Martinez-Conde S (2013) Saccadic velocity as an arousal index in naturalistic tasks. Neurosci Biobehav Rev 37:968–975.

      Johnston R, Snyder AC, Khanna SB, Issar D, Smith MA (2022) The eyes reflect an internal cognitive state hidden in the population activity of cortical neurons. Cereb Cortex 32:3331–3346.

      (3) The authors use the analysis shown in Figure 6D to argue that across recording sessions the activity components capturing variance in pupil size and saccade tuning are uncorrelated. however, the distribution (green) seems to be non-uniform with a peak at very low and very high correlation specifically. The authors should test if such an interpretation is correct. If yes, where are the low and high correlations respectively? Are there potentially two functional areas in SC? 

      We agree with the reviewer that our actual data distribution was non-uniform. We examined individual sessions with high and low variance explained and did not find notable differences. One source of this variation has to do with session length. Longer sessions in principle should have a chance distribution of variance explained closer to zero because they contained more time bins. Given that we had no specific hypothesis for a non-uniform distribution, we have simply displayed the full distribution of values in our figure and the statistical result of a comparison to a shuffled distribution.

      Reviewer #3 (Public Review):

      (1) However, I am concerned about two main points: First, the authors repeatedly say that the "output" layers of the SC are the ones with the highest motor indices. This might not necessarily be accurate. For example, current thresholds for evoking saccades are lowest in the intermediate layers, and Mohler & Wurtz 1972 suggested that the output of the SC might be in the intermediate layers. Also, even if it were true that the high motor index neurons are the output, they are very few in the authors' data (this is also true in a lot of other labs, where it is less likely to see purely motor neurons in the SC). So, this makes one wonder if the electrode channels were simply too deep and already out of the SC? In other words, it seems important to show distributions of encountered neurons (regardless of the motor index) across depth, in order to better know how to interpret the tails of the distributions in the motor index histogram and in the other panels of Figure Supplement 1. I elaborate more on these points in the detailed comments below. 

      The reviewer makes a good point about the efferent signals from SC. It is true that electrical thresholds are often lowest in intermediate layers, though deep layers do project to the oculomotor nuclei (Sparks, 1986; Sparks & Hartwich-Young, 1989) and often intermediate and deep layers are considered to function together to control eye movements (Wurtz & Albano, 1980). As suggested by the reviewer, we have edited the text throughout the manuscript to say that slow drift was less evident in SC neurons with a higher motor index, as well as included the above references and points about the intermediate and deep layers (Lines 73-81). Aside from the question of which layers of the SC function as the “motor output”, the reviewer raises a separate and important question – are our deep recordings still in SC. Here, we can say definitively that they are. We removed neurons if they did not exhibit elevated (above baseline) firing rates during the visual or saccade epochs of the MGS task (see Methods section on “Exclusion criteria”). All included neurons possessed a visual, visuomotor or motor response, consistent with the response properties of neurons in the SC. In addition, we found a number of neurons well above the bottom of the probe with strong motor responses and minimal loadings onto the slow drift axis (see Figure 2 – figure supplement 1A), consistent with the reviewer’s comment that intermediate layer neurons are tuned for movement and play a role in saccade production.

      Mohler CW, Wurtz RH. Organization of monkey superior colliculus: intermediate layer cells discharging before eye movements. Journal of neurophysiology. 1976 Jul 1;39(4):722-44.

      Sparks DL. Translation of sensory signals into commands for control of saccadic eye movements: role of primate superior colliculus. Physiol Rev. 1986 Jan;66(1):118-71. doi: 10.1152/physrev.1986.66.1.118. PMID: 3511480.

      Sparks DL, Hartwich-Young R. The deep layers of the superior colliculus. Reviews of oculomotor research. 1989 Jan 1;3:213-55.

      Wurtz RH, Albano JE. Visual-motor function of the primate superior colliculus. Annu Rev Neurosci. 1980;3:189-226. doi: 10.1146/annurev.ne.03.030180.001201. PMID: 6774653.

      (2) Second, the authors find that the SC cells with a low motor index are modulated by pupil diameter. However, this could be completely independent of an "arousal signal". These cells have substantial visual responses. If the pupil diameter changes, then their activity should be influenced since the monkey is watching a luminous display. So, in this regard, the fact that they do not see "an arousal signal" in most motor neurons (through the pupil diameter analyses) is not evidence that the arousal signal is filtered out from the motor neurons. It could simply be that these neurons simply do not get affected by the pupil diameter because they do not have visual sensitivity. So, even with the pupil data, it is still a bit tricky for me to interpret that arousal signals are excluded from the "output layers" of the SC. 

      The reviewer makes an important point about the SC’s visual responses. Neurons with a low motor index are, conversely, likely to have a stronger visual response index. However, we do not believe that changes in luminance can explain why the correlation between SC spiking response and pupil size is weaker for neurons with a lower motor index. Firstly, the changes in pupil size observed in the current paper and our previous work are slow and occur on a timescale of minutes (Cowley et al., 2020, Neuron) and are correlated with eye movement measures such as reaction time and microsaccade rate (Johnston et al., 2022, Cerebral Cortex). This is in stark contrast to luminance-evoked changes in pupil size that occur on a timescale of less than a second. Secondly, as shown the new Figure 5 – figure supplement 1 in the revised manuscript, very similar results were found when SC spiking responses were correlated with pupil size during the baseline period, when only the fixation point was on the screen. Although the luminance of the small peripheral target stimulus can result in small luminance-evoked changes in pupil size, no changes in luminance occurred during the baseline period which was defined as 100ms before the onset of the target stimulus. In Figure 2 – figure supplement 1 and Author response image 1 above, we show that slow drift is the same whether calculated on the baseline response, delay period, or peri-saccadic epoch. Thus, the measurement of slow drift is insensitive to the precise timing of the selection of both the window for the spiking response and the window for the pupil measurement. If luminance were the explanation for the slow changes in firing observed in visually responsive SC neurons, it would require those neurons to exhibit robust, sustained tuned responses to the small changes in retinal illuminance induced by the relatively small fluctuations in pupil size we observed from minute to minute. We are aware of no reports of such behavior in visually-responsive neurons in SC. We have included these analyses and this reasoning in the revised manuscript on lines 478-495.

      Reviewer#1 (Recommendations for the author):

      (1) It would be useful to provide line numbers in subsequent manuscripts for reviewers.

      Line numbers have been added in the revised version of the manuscript.

      (2) Page #6; last sentence: "...even impact processing at the early to mid stages of the visuomotor transformation, without leading to unwanted changes in motor output." I do not believe the authors have provided evidence that arousal levels were not associated with changes in motor output.

      As suggested by Reviewer 3 (see Public Reviews, Reviewer 3, Point 2), we have edited the text throughout the manuscript to say that slow drift was less evident in SC neurons with a higher motor index. This sentence in the revised manuscript now reads:

      “This provides a potential mechanism through which signals related to cognition and arousal can exist in the SC, and even impact processing at the early to mid stages of the visuomotor transformation, without leading to unwanted changes in SC neurons that are linked to saccade execution.”

      (3) Page #8; last paragraph: Although deep-layer SC neurons may not have been obtained during every recording session, a summary of the motor index scores observed along the probe across sessions would be useful to confirm their assumptions. 

      See Author response image 2 below which shows the motor index of each recoded SC neuron on the x-axis and session number on the y-axis. The points are colored by to the squared factor loading which represents the variance explained between the response a neuron and the slow drift axis (see Figure 3B of the main manuscript). You can see from this plot that neurons with a stronger component loading (shown in teal to yellow) typically have a lower motor index whereas the opposite is true for neurons with a weaker component loading (shown in dark blue).

      Author response image 2.

      Scatter plot showing the motor index of each recorded neuron along with the session number in which it was recorded. The points are colored by to the squared factor loading for each neuron along the slow drift axis. Note that loadings above 0.5 (33 data points in total) have been thresholded at 0.5 so that we could effectively use the color range to show all of the slow drift axis loadings.

      (4) Page #10; first paragraph: The authors should state the time window of the delay period used, since it may be distinct from the pupil analysis (first 200ms of delay). 

      This has been stated in the revised version of the manuscript. The sentence now reads:

      “We first asked if arousal-related fluctuations are present in the SC. As in previous studies that recorded from neurons in the cortex (Cowley et al., 2020), we found that the mean spiking responses of individual SC neurons during the delay period (chosen at random on each trial from a uniform distribution spanning 600-1100ms, see Methods) fluctuated over the course of a session while the monkeys performed the MGS task (Figure 2A, left).”

      (5) Page #10; second paragraph: Extra period at the end of a sentence: " most variance in the data..". 

      Fixed in the revised version of the manuscript.

      (6) Page #12: "between projections onto the SC slow drift axis and mean pupil size during the first 200ms of the delay period when a task-related pupil response could be observed." What criteria was used to determine whether a task-related pupil response was observed? 

      This was chosen based on the results of a previous study in our lab that used the same memory-guided saccade task to investigate the relationship between slow drift and changes in based and evoked pupil size (see Johnston et al., 2022, Cereb. Cortex, Figure 6B). The period was chosen based on plotting the average pupil size aligned on different trial epochs. As we show in Figure 5-figure supplement 3 above, the pupil interactions with slow drift did not depend on the particular time window of the pupil we chose.  

      (7) Page #14; Figure 2A: The axes for the individual channels are strangely floating and quite different from all other figures. Please label the channel in the figure legend that was used as an example of the projected values onto the slow drift axis.

      The figure has been changed in the revised version of the manuscript so that the tick mark denoting zero residual spikes per second is on the top layer of each plot. A scale bar was chosen instead of individual axes to reduce clutter in the figure as it was used to demonstrate how slow drift was computed. Residual spiking responses from all neurons were projected on the slow drift axis to generate the scatter plot in the bottom right-hand corner of Figure 2A. There is no single neuron to label.

      (8) Page #16: "These results demonstrate that even though arousal-related fluctuations are present in the SC, they are isolated from deep-layer neurons that elicit a strong saccadic response and presumably reside closer to the motor output." In line with our major comments, lack of arousal-related activity during the delay period is meaningless for deep-layer SC neurons that are generally inactive during this time. It does not imply that there is no arousal signal! 

      Addressed in Public Reviews, Reviewer 1, Point 1 & 2. We found a similar lack of arousal-related modulations reported for deep-layer SC neurons when slow drift was computed using the saccade epoch (Figure 1 above). In addition, similar dynamics were observed when the SC slow drift axis was computed using spiking responses during the baseline, delay, visual and saccade period (Figure 2).

      (9) Page #18: "These findings provide additional support for the hypothesis that arousalrelated fluctuations are isolated from neurons in the deep layers of the SC." The same criticism from above applies.

      Addressed in Public Reviews, Reviewer 1, Point 1 & 2.

      (10) Page #20; paragraph 3: "Taken together, the findings outlined above..." Would be useful to be more specific when referring to "activity" ; e.g., "...these neurons did not exhibit large fluctuations in delay-period activity over time".

      This sentence has been changed in the revised manuscript in light of the reviewer’s comments. It now reads:

      “In addition to being more weakly correlated with pupil size, the spiking responses of these neurons did not exhibit large fluctuations over time (Figure 2), and when considering the neuronal population as a whole, explained less variance in the slow drift axis when it was computed using population activity in the SC (Figure 3) and PFC (Figure 4).”

      Reviewer #3 (Recommendations for the author):

      The paper is clear and well-written. However, I am concerned about two main points: 

      (1) First, the authors repeatedly say that the "output" layers of the SC are the ones with the highest motor indices. This might not necessarily be accurate. For example, current thresholds for evoking saccades are lowest in the intermediate layers, and Mohler & Wurtz 1972 suggested that the output of the SC might be in the intermediate layers. Also, even if it were true that the high motor index neurons are the output, they are very few in the authors' data (this is also true in a lot of other labs, where it is less likely to see purely motor neurons in the SC). So, this makes one wonder if the electrode channels were simply too deep and already out of the SC. In other words, it seems important to show distributions of encountered neurons (regardless of motor index) across depth, in order to better know how to interpret the tails of the distributions in the motor index histogram and in the other panels of the figure supplement 1. I elaborate more on these points in the detailed comments below. 

      Addressed in Public Reviews, Reviewer 3, Point 1.

      (2) Second, the authors find that the SC cells with a low motor index are modulated by pupil diameter. However, this could be completely independent of an "arousal signal". These cells have substantial visual responses. If the pupil diameter changes, then their activity should be influenced since the monkey is watching a luminous display. So, in this regard, the fact that they do not see "an arousal signal" in most motor neurons (through the pupil diameter analyses) is not evidence that the arousal signal is filtered out from the motor neurons. It could simply be that these neurons simply do not get affected by the pupil diameter because they do not have visual sensitivity. So, even with the pupil data, it is still a bit tricky for me to interpret that arousal signals are excluded from the "output layers" of the SC. 

      Addressed in Public Reviews, Reviewer 3, Point 2.

      (3) I think that a remedy to the first point above is to change the text to make it a bit more descriptive and less interpretive. For example, just say that the slow drifts were less evident among the neurons with high motor index. 

      We thank the reviewer for this suggestion (see Public Reviews, Reviewer 3, Point 1).

      (4) For the second point, I think that it is important to consider the alternative caveat of different amounts of light entering the system. Changes in light level caused by pupil diameter variations can be quite large. 

      We thank the reviewer for this suggestion (see Public Reviews, Reviewer 3, Point 2).

      (5) Line 31: I'm a bit underwhelmed by this kind of statement. i.e. we already know that cognitive processes and brain states do alter eye movements, so why is it "critical" that high precision fixation and eye movements are maintained? And, isn't the next sentence already nulling this idea of criticality because it does show that the brain state alters the SC neurons? In fact, cognitive processes are already known to be most prevalent in the intermediate and deep layers of the SC. 

      It seems clear that while cognitive state does affect eye movements, it is desirable to have some separation between cognitive state and eye movement control. Covert attention, for instance, is precisely a situation where eye movement control is maintained to avoid overt saccades to the attended stimulus, and yet there are clear indications of attention’s impact on microsaccades and fixation. We stand by our statement that an important goal of vision is to have precise fixation and movements of the eye, and yet at the same time the eyes are subject to numerous influences by cognitive state.

      (6) Line 65: it is better to clarify that these are "functional layers" because there are actually more anatomical layers. 

      We have edited this sentence in the revised version of the manuscript so that it now reads:

      “The role of these projections in the visuomotor transformation depends on the functional layer of the SC in which they terminate”.

      (7) Line 73: this makes it sound like only the deepest layers are topographically organized, which is not true. Also, as early as Mohler & Wurtz, 1972, it was suggested that the intermediate layers have the biggest impacts downstream of the SC. This is also consistent with electrical microstimulation current thresholds for evoking saccades from the SC. 

      We have addressed the reviewers’ comments about the intermediate layers having the biggest impact downstream of the SC in Public Reviews, Reviewer 3, Point 1. Furthermore, line 73 has been changed in the revised manuscript so that it now reads:

      “As is the case for neurons in the superficial and intermediate layers, they [SC motor neurons] form a topographically organized map of visual space (White et al. 2017; Robinson 1972; Katnani and Gandhi 2011)”.  

      (8) Line 100: there is an analogous literature regarding the question of why unwanted muscle contractions do not happen. Specifically, in the context of why SC visual bursts do not automatically cause saccades (which is a similar problem to the ones you mention about cognitive signals interfering by generating unwanted eye movements), both Jagadisan & Gandhi, Curr Bio, 2022 and Baumann et al, PNAS, 2023 also showed that SC population activity not only has different temporal structure (Jagadisan & Gandhi) but also occupy different subspaces (Baumann et al) under these two different conditions (visual burst versus saccade burst). This is conceptually similar to the idea that you are mentioning here with respect to arousal. So, it is worth it to mention these studies here and again in the discussion. 

      We are grateful to the reviewer for these suggestions and have included text in the Introduction (Lines 125-128) and Discussion (Lines 678-682) of the revised manuscript along with the references cited above.

      (9) Line 147: as mentioned above, it is now generally accepted that there are quite a few "pure" motor neurons in the SC. This is consistent with what you find. E.g. Baumann et al., 2023. And, again see Mohler and Wurtz in the 1970's. So, I wonder how useful it is to go too much into this idea of the deeper motor neurons (e.g. the correlations in the other panels of the Figure 1 supplement). 

      This is related to the reviewer’s comment that the output of the SC might be in the intermediate layers. This concern has been addressed in Public Reviews, Reviewer 3, Point 1.

      (10) Figure 1 should say where the RF was for the shown spike rasters. i.e. were these the same saccade target across trials? And where was that location relative to the RF? It would help also in the text to say whether the saccade was always to the RF center or whether you were randomizing the target location. 

      We centered the array of saccade targets using the microstimulation-evoked eye movement for SC (see Methods section “Memory-guided saccade task”) to find the evoked eccentricity, and then used saccade targets with equal spacing of 45 degrees starting at zero (rightward saccade target). We did not do extensive RF mapping beyond this microstimulation centering. In Figure 1, the spike rasters are shown for a target that was visually identified to be within the neuron’s RF based on assessing responses to all 8 target angles. We have added information about this to the figure caption.

      (11) Line 218: but were there changes in the eye movement statistics? For example, the slow drift eye movements during fixation? Or even the microsaccades? 

      Addressed in Public Reviews, Reviewer 2, Point 2.  

      (12) Line 248: shuffling what exactly? I think that more explanation would be needed here. 

      Addressed in Public Reviews, Reviewer 1, Point 3.  

      (13) Line 263: but isn't this reflecting a sensory transient in the pupil diameter, since the target just disappeared? 

      Addressed in Public Reviews, Reviewer 3, Point 2.  

      (14) Line 271: I suspect that slow drift eye movements (in between microsaccades) would show higher correlations. Not sure how well you can analyze those with a video-based eye tracker. 

      We agree that fixational drift would be a worthwhile metric, but it is not one we have focused on here and to our knowledge does require higher precision tracking. 

      (15) Line 286: again, see above about similar demonstrations with respect to the visual and motor burst intervals, which clearly cause the same problem (even stronger) as the one studied here. 

      See reply, including Figure 2.

      (16) Line 330: again, I'm not sure deeper necessarily automatically means closer to the output. For example, current thresholds for evoked saccades grow higher as you go deeper. Maybe the authors can ask their colleague Neeraj Gandhi about this point specifically, just to be safe. Maybe the safest would be to remain descriptive about the data, and just say something like: arousal-related fluctuations were absent in our deepest recorded sites. 

      Addressed in Public Reviews, Reviewer 3, Point 1.

      (17) Line 332: likewise, statements like this one here would be qualified if the output was the intermediate layers......anyway if I understand what I read so far in the paper, the signal will be anyway orthogonal to the motor burst population subspace. So, maybe there's no need to emphasize that it goes away in the very deepest layers. 

      See reply above, Public Reviews, Reviewer 1, Point 4.

      (18) Figure 3A: related to the above, I think one issue could be that the deeper contacts might already be out of the SC. Maybe some cell count distribution from each channel should help in this regard. i.e. were you finding way fewer saccade-related neurons in the deepest channels (even though the few that you found were with high motor index)? If so, then wouldn't this just mean that the channel was too deep? I think there needs to be an analysis like this, to convince readers that the channels were still in the SC. Ideally, electrical stimulation current thresholds for evoking saccades at different depths would be tested, but I understand that this can be difficult at this stage. 

      Addressed in Public Reviews, Reviewer 3, Point 1.

      (19) I keep repeating this because in general, cognitive effects are stronger in the intermediate/deeper layers than in the superficial layers. If these interfere with eye movements like arousal, then why should arousal be different?

      Few studies have investigated the effects of attention on “pure” movement SC neurons that only discharge during a saccade. One study, which we cited in Introduction (Ignashchenkova et al., 2004, Nat. Neurosci.), found significant differences in spiking responses between trials with and without attentional cueing for visual and visuomotor neurons. No significant difference was found for motor neurons, consistent with our hypothesis that signals related to cognition and arousal are kept separate from saccade-related signals in the SC.

      (20) The problem with Figure 5 and its related text is that the neurons with low motor index are additionally visual. So, of course, they can be modulated if the pupil diameter changes!

      Addressed in Public Reviews, Reviewer 3, Point 2.  

      (21) I had a hard time understanding Figure 6. 

      See reply above, Public Reviews, Reviewer 1, Point 4.

      (22) Line 586: these cells have more visual responses and will be affected by the amount of light entering the eye. 

      Addressed in Public Reviews, Reviewer 3, Point 2.

    1. Author response:

      The following is the authors’ response to the original reviews

      Recommendations for the Authors:

      Reviewer #1:

      We think that this manuscript brings an important contribution that will be of interest in the areas of statistical physicists, (microbiota) ecology, and (biological) data science. The evidence of their results is solid and the work improves the state-of-the-art in terms of methods. We have a few concerns that, in our opinion, the authors should address.

      Major concerns:

      (1) While the paper could be of interest for the broad audience of e-Life, the way it is written is accessible mainly to physicists. We encourage the authors to take the broad audience into account by i) explaining better the essence of what is being done at each step, ii) highlighting the relevance of the method compared to other methods, iii) discussing the ecological implications of the results.

      Examples on how to approach i) include: Modify or expand Figure 1 so that non-familiar readers can understand the summary of the work (e.g. with cartoons representing communities, diseased states and bacterial interactions and their relationship with the inference method); in each section, summarize at the beginning the purpose of what is going to be addressed in this section, and summarize at the end what the section has achieved; in Figure 2, replace symbols by their meaning as much as possible-the same for Figure 1, at the very least in the figure caption.

      Example on how to approach ii): Since the authors aim to establish a bridge between disordered systems and microbiome ecology, it could be useful to expand a bit the introduction on disordered systems for biologists/biophysicists. This could be done with an additional text box, which could also highlight the advantages of this approach in comparison to other techniques (e.g. model-free approaches can also classify healthy and diseased states).

      Example on how to approach iii): The authors could discuss with more depth the ecological implications of their results. For example, do they have a hypothesis on why demographic and neutral effects could dominate in healthy patients?

      We thank the reviewer for the observations. Following the suggestion in the revised version, each section outlines the goal of what will be addressed in that section, and summarizes what we have achieved at the end; We also updated Figure 1 and Figure 2.

      (i) For figure 1, we expanded and hopefully made more clear how we conceptualize the problem, use the data, andestablish our method. In Figure 2, we enriched the y labels of each panel with the name associated with the order parameter.

      (ii) We thank the reviewer for helping us improve the readability of the introductory part, thus providing moreinsights into disordered systems techniques for a broader audience. We have added a few explanations at the end of page 2 – to explain the advantages of such methodology compared to other strategies and models.

      (iii) We thank the reviewer for raising the need for a more in-depth ecological discussion of our results. A simple wayto understand why neutral effects may dominate in healthy patients is the following. Neutrality implies that species differences are mainly shaped by stochastic processes such as demographic noise, with species treated as different realizations of the same underlying stochastic ecological dynamics. In our analysis, we observe that healthy individuals tend to exhibit highly similar microbial communities, suggesting that the compositional variability among their microbiomes is compatible—at least in part—with the fluctuations expected from demographic stochasticity alone. In contrast, patients with the disease display significantly more heterogeneous microbial compositions. The diversity and structure of their gut communities cannot be satisfactorily explained by neutral demographic fluctuations alone.

      This discrepancy implies that additional deterministic forces—such as altered ecological interactions—are driving the divergence observed in dysbiotic states. In diseased individuals, the breakdown of such interactions leads to a structurally distinct regime that may correspond to a phase of marginal stability, as indicated by our theoretical modeling. This shift marks a transition from a community governed by neutrality and demographic noise to one dominated by non-neutral ecological forces (as depicted in Figure 4). We added these comments in the discussion section of the revised manuscript.

      (2) Taking into account the broader audience, we invite the authors to edit the abstract, as it seems to jump from one ecological concept to another without explicitly communicating what is the link between these concepts. From the first two sentences, the motivation seems to be species diversity, but no mention of diversity comes after the second sentence. There is no proper introduction/definition of what macroecological states are. After that, the authors switch to healthy and unhealthy states, without previously introducing any link between gut microbiota states and the host’s health (which perhaps could be good in the first or second sentence, although other framings can be as valid). After that, interactions appear in the text and are related to instability, but the reader might not know whether this is surprising or if healthy/unhealthy states are generally related to stability.

      We pointed out a few examples, but the authors could extend their revision on i), ii) and iii) beyond such specific comments. In our opinion, this would really benefit the paper.

      In response to the reviewer’s concern about conceptual clarity and structure, we substantially revised the abstract to improve its accessibility and logical flow. In the revised abstract, we now clearly link species diversity to microbiome structure and function from the outset, addressing initial confusion. We provide a concise definition of ”macroecological states,” framing them as reproducible statistical patterns reflecting community-level properties. Additionally, the revised version explicitly connects gut microbiome states to host health earlier, resolving the previous abrupt shift in focus. Finally, we conclude by highlighting how disordered systems theory advances our understanding of microbiome stability and functioning, reinforcing the novelty and broader significance of our approach. Overall, the revised abstract better serves a broad interdisciplinary audience, including readers unfamiliar with the technicalities of disordered systems or microbial ecology, while preserving the scientific depth and accuracy of our work

      (3) The connection with consumer-resource (CR) models is quite unusual. In Equation (12), why do the authors assume that the consumption term does not depend on R? This should be addressed, since this term is usually dependent on R in microbial ecology models.

      In case this is helpful, it is known that the symmetric Lotka-Volterra model emerges from time-scale separation in the MacArthur model, where resources reproduce logistically and are consumed by other species (e.g., plants eaten by herbivores). Consumer-resource models form a broad category, while the MacArthur model is a specific case featuring logistic resource growth. For microbes, a more meaningful justification of the generalized Lotka-Volterra (GLV) model from a consumer-resource perspective involves the consumer-resource dynamics in a chemostat, where time-scale separation is assumed and higher-order interactions are neglected. See, for example: a) The classic paper by MacArthur: R. MacArthur. Species packing and competitive equilibrium for many species. Theoretical Population Biology, 1(1):1-11, 1970. b) Recent works on time-scale separation in chemostat consumer-resource models: Anna Posfai et al., PRL, 2017 Sireci et al., PNAS, 2023 Akshit Goyal et al., PRX-Life, 2025

      We thank the reviewer for the observation. We apologize for the typo that appeared in the main text and that we promptly corrected. The Consumers-Resources model we had in mind is the classical case proposed by MacArthur, where resources are self-regulated according to a logistic growth mechanism, which leads to the generalized LotkaVolterra model we employ in our work.

      Minor concerns:

      (1) The title has a nice pun for statistical physicists, but we wonder if it can be a bit confusing for the broader audience of e-Life. Although we leave this to the author’s decision, we’d recommend considering changing the title, making it more explicit in communicating the main contribution/result of the work.

      Following the reviewer’s suggestion, we have introduced an explanatory subtitle: “Linking Species Interactions to Dysbiosis through a Disordered Lotka-Volterra Framework”.

      (2) Review the references - some preprints might have already been published: Pasqualini J. 2023, Sireci 2022, Wu 2021.

      We thank the reviewer for pointing our attention to this inaccuracy. We updated the references to Pasqualini and Sireci papers. To our knowledge, Wu’s paper has appeared as an arXiv preprint only.

      (3) Species do not generally exhibit identical carrying capacities (see Grilli, Nat. Commun., 2020; some taxa are generally more abundant than others. The authors could discuss whether the model, with the inferred parameters, can accurately reproduce the distribution of species’ mean abundances.

      We thank the reviewer for this insightful comment. As discussed in the revised manuscript (lines 294–299), our current model does not accurately reproduce the empirical species abundance distribution (SAD). This limitation stems from the assumption of constant carrying capacities across species. While empirical observations (e.g., Grilli et al., Nat. Commun., 2020 [1]) show heterogeneous mean abundances often following power-law or log-normal distributions. However, our model assumes constant carrying capacity, resulting in SADs devoid of fat tails, which diverge from empirical data.

      This simplification is implemented to maintain the analytical tractability of the disordered generalized Lotka-Volterra (dGLV) framework, a common approach also found in prior works such as Bunin (2017) and Barbier et al. (2018) [2, 3]. Introducing heterogeneity in carrying capacities, such as drawing them from a log-normal distribution, or switching to multiplicative (rather than demographic) noise, could indeed produce SADs that better align with empirical data. Nevertheless, implementing changes would significantly complicate the analytical treatment.

      We acknowledge these directions as promising avenues for future research. They could help enhance the empirical realism of the model and its capacity to capture observed macroecological patterns while posing new theoretical challenges for disordered systems analysis

      (4) A substantial number of cited works (Grilli, Nat. Commun., 2020; Zaoli & Grilli, Science Advances, 2021; Sireci et al., PNAS, 2023; Po-Yi Ho et al., eLife, 2022) suggest that environmental fluctuations play a crucial role in shaping microbiome composition and dynamics. Is the authors’ analysis consistent with this perspective? Do they expect their conclusions to remain robust if environmental fluctuations are introduced?

      We thank the reviewer for stressing this point. The introduction of environmental fluctuations in the model formally violates detailed balance, thereby preventing the definition of an energy function. To date, no study has integrated random interactions together with both demographic and environmental noise within a unified analytical framework. This is certainly a highly promising direction that some of the authors are already exploring. However, given the inherently out-of-equilibrium nature of the system and the absence of a free energy, we would need to adopt a Dynamical Mean-Field Theory formalism and eventually analyze the corresponding stationary equations to be solved self-consistently. We added, however, a brief note in the Discussion section.

      (5) The term “order parameters“ may not be intuitive for a biological audience. In any case, the authors should explicitly define each order parameter when first introduced.

      We thank the reviewer for the comment. We introduced the names of the order parameters as soon as they are introduced, along with a brief explanation of their meaning that may be accessible to an audience with biological background.

      (6) Line 242: Should ψU be ψD?

      We thank the reviewer for the observation. We corrected the typo.

      (7) Given that the authors are discussing healthy and diseased states and to avoid confusion, the authors could perhaps use another word for ’pathological’ when they refer to dynamical regimes (e.g., in Appendix 2: ’letting the system enter the pathological regime of unbounded growth’).

      We thank the reviewer for the helpful comment. As suggested, we used the term “unphysical” instead of “pathological” where needed.

      Reviewer #2:

      (1) A technical point that I could not understand is how the authors deal with compositional data. One reason for my confusion is that the order parameters h and q0 are fixed n data to 1/S and 1/S2, and thus I do not see how they can be informative. Same for carrying capacity, why is it not 1 if considering relative abundance?

      We thank the reviewer for raising this point. We acknowledge that the treatment of compositional data and the interpretation of order parameters h and q0 were not sufficiently clarified in the manuscript. Additionally, there was an imprecision in the text regarding the interpretation of these parameters.

      As defined in revised Eq. (4) of the manuscript, h and q0 are to be averaged over the entire dataset, summing across samples α. Specifically, and , where S<sub>α</sub> is the number of species present in sample α and is the average over samples. These parameters are therefore informative, as they encapsulate sample-level ecological diversity, and their variation reflects biological differences between healthy and diseased states. For instance, Pasqualini et al., 2024 [4] reported significant differences in these metrics between health conditions, thereby supporting their ecological relevance.

      Regarding carrying capacities, we clarify that although we work with relative abundance data (i.e., compositional data), we do not fix the carrying capacity K to 1. Instead, we set K to the maximum value of xi (relative abundance) within each sample, to preserve compatibility with empirical data and allow for coexistence. While this remains a modeling assumption, it ensures better ecological realism within the constraints of the disordered GLV framework.

      (2) Obviously I’m missing something, so it would be nice to clarify in simple terms the logic of the argument. I understand that Lagrange multipliers are going to be used in the model analysis, and there are a lot of technical arguments presented in the paper, but I would like a much more intuitive explanation about the way the data can be used to infer order parameters if those are fixed by definition in compositional data.

      We thank the reviewer for the observation. The order parameters can be measured directly from the data, even in the presence of compositionality, as explained above. We can connect those parameters with the theory even for compositional data, because the only effect of adding the compositionality constraint is to shift the linear coefficient in the Hamiltonian, which corresponds to shifting the average interaction µ. However, the resulting phase diagram is mostly affected by the variance of the interactions σ2 (as µ is such that we are in the bounded phase).

      (3) Another point that I did not understand comes from the fact that the authors claim that interaction variance is smaller in unhealthy microbiomes. Yet they also find that those are closer to instability, and are more driven by niche processes. I would have expected the opposite to be true, more variance in the interactions leading to instability (as in May’s original paper for instance). Is this apparent paradox explained by covariations in demographic stochasticity (T) and immigration rate (lambda)? If so, I think it would be very useful to comment on that.

      As Altieri and coworkers showed in their PRL (2021) [5], the phase diagram of our model differs fundamentally from that of Biroli et al. (2018) [6]. In the latter, the intuitive rule – greater interaction variance yields greater instability – indeed holds. For the sake of clarity, we have attached below the resulting phase diagram obtained by Altieri et al.

      The apparent paradox arises because the two phase diagrams are tuned by different parameters. Consequently, even at low temperature and with weak interaction variance, our system may sit nearer to the replica-symmetrybreaking (RSB) line.

      Fig. 3 in the main text it is not a (σ,T) phase diagram where all other parameters are kept constant. Rather, it is a plot of the inferred σ and T parameters from the data (without showing the corresponding µ).

      To capture the full, non-trivial influence of all parameters on stability, we studied the so-called “replicon eigenvalue” in the RS (i.e. single equilibrium) approximation. This leading eigenvalue measures how close a given set of inferred parameters – and hence a microbiome – is to the RSB threshold. For a visual representation of these findings, refer to Figure 4.

      Author response image 1.

      (4) What do the empirical SAD look like? It would be nice to see the actual data and how the theoretical SADs compare.

      The empirical species abundance distributions (SADs) analyzed in our study are presented and discussed in detail in Pasqualini et al., 2024 [4]. Given the overlap in content, we chose not to reproduce these figures in the current manuscript to avoid redundancy.

      As we also clarify in the revised text, the theoretical SAD is derived from the disordered generalized Lotka-Volterra (dGLV) model in the unique fixed point phase typically exhibit exponential tails. These distributions do not match the heavier-tailed patterns (e.g., log-normal or power-law-like) observed in empirical microbiome data. This discrepancy stems from the simplifying assumptions of the dGLV framework, including the use of constant carrying capacities and demographic noise.

      In the revised manuscript, we have added a brief discussion in the revised manuscript to explicitly acknowledge this limitation and emphasize it as a direction for future refinement of the model, such as incorporating heterogeneous carrying capacities or exploring alternative noise structures.

      (5) Some typos: often “niche” is written “nice”.

      We thank the reviewer for this suggestion. After inspecting the text, we corrected the reported typos.

      Reviewer #3:

      Major comments:

      (1) In the S3 text, the authors say that filtered metagenomic reads were processed using the software Kaiju. The description of the pipeline does not mention how core genes were selected, which is often a crucial step in determining the abundance of a species in a metagenomic sample. In addition, the senior author of this manuscript has published a version of Kaiju that leverages marker genes classification methods (deemed Core-Kaiju), but it was not used for either this manuscript or Pasqualini et al. (2014; Tovo et al., 2020). I am not suggesting that the data necessarily needs to be reprocessed, but it would be useful to know how core genes were chosen in Pasqualini et al. and why Core-Kaiju was not used (2014).

      Prior to the current manuscript and the PLOS Computational Biology paper by Pasqualini et al. [4], we applied the core-Kaiju protocol to the same dataset used in both studies. However, this tool was originally developed and validated using general catalogs of culturable organisms, not specifically tuned for gut microbiomes. As a result, we have realized that in many samples Core Kajiu would filter only very few species (in some samples, the number of identified species was as low as 5–10), undermining the reliability of the analysis. Due to these limitations, we opted to use the standard Kaiju version in our work. We are actively developing an improved version of the core-Kaiju protocol that will overcome the discussed limitations and preliminary results (not shown here) indicate the robustness of the obtained patterns also in this case.

      (2) My understanding of Pasqualini et al. was that diseased patients experienced larger fluctuations in abundance, while in this study, they had smaller fluctuations (Figure 3a; 2024). Is this a discrepancy between the two models or is there a more nuanced interpretation?

      We thank the reviewer for the observation. This is only an apparent discrepancy, as the term fluctuation has different meanings in the two contexts. The fluctuations referred to by the reviewer correspond to a parameter of our theory—namely, noise in the interactions. Conversely, in Pasqualini et al. σ indicates environmental fluctuations. Nevertheless, there is no conceptual discrepancy in our results: in both studies, unhealthy microbiomes were found to be less stable. In fact, also in this study, notably Fig. 4, shows that unhealthy microbiomes lie closer to the RSB line, a phenomenon that is also associated with enhanced fluctuations.

      (3) Line 38-41: It would be helpful to explicitly state what “interaction patterns” are being referenced here. The final sentence could also be clarified. Do microbiomes “host“ interactions or are they better described as a property (“have”, “harbor”). The word “host” may confuse some readers since it is often used to refer to the human host. I am also not sure what point is being made by “expected to govern natural ones”. There are interactions between members of a microbiome; experimental studies have characterized some of these interactions, which we expect to relate in some way to interactions in nature. Is this what the authors are saying?

      Thanks. We agree that this sentence was not clear. Indeed, we are referring to pairwise species interactions and not to host-microbiome interactions. We have rewritten this part in the following way: In fact, recent work shows that the network-level properties of species-species interactions —for example, the sign balance, average strength, and connectivity of the inferred interaction matrix— shift systematically between healthy and dysbiotic gut communities (see for instance, [7, 8]). Pairwise species interactions have been quantified in simplified in-vitro consortia [9, 10]; we assume that the same classes of interactions also operate—albeit in a more complex form—in the native gut microbiome.

      (4) Line 43: I appreciate that the authors separated neutral vs. logistic models here.

      (5) Lines 51-75: The framing here is well-written and convincing. Network inference is an ongoing, active subject in ecology, and there is an unfortunate focus on inferring every individual interaction because ecologists with biology backgrounds are not trained to think about the problem in the language of statistical physics.

      We thank the reviewer for these positive comments.

      (6) Line 87: Perhaps I’m missing something obvious, but I don’t see how ρi sets the intrinsic timescale of the dynamics when its units are 1/(time*individuals), assuming the dimensions of ri are inverse time.

      We thank the reviewer for the observation. We corrected this phrase in the main text.

      (7) Lines 189-190: “as close as possible to the data” it would aid the reader if you specified the criteria meant by this statement.

      We thank the reviewer for the observation. We removed the sentence, as it introduced some redundancy in our argument. In the subsequent text, the proposed method is exposed in details.

      (8) Line 198: It would aid the reader if you provided some context for what the T - σ plane represents.

      We thank the referee for the helpful indication. Indeed, we have better clarified the mutual role of the demographic noise amplitude and strength of the random interaction matrix, as theoretically predicted in the PRL (2021) by Altieri and coworkers [5]. Please, find an additional paragraph on page 6 of the resubmitted version.

      (9) Line 217: Specifying what is meant by “internal modes“ would aid the typical life science reader.

      We thank the reviewer for the suggestion. Recognizing that referring to “internal modes” to describe the SAD shape in that context might cause confusion, we replaced “internal modes“ with “peaks”.

      (10) Line 219: Some additional justification and clarification are needed here, as some may think of “m“ as being biomass.

      We added a sentence to better explain this concept. “In classical and quantum field theory, the particle-particle interaction embedded in the quadratic term is typically referred to as a mass source. In the context of this study, captures quadratic fluctuations of species abundances, as also appearing in the expression of the leading eigenvalue of the stability matrix.”

      Minor comments:

      (1) I commend the authors for removing metagenomic reads that mapped to the human genome in the preprocessing stage of their pipeline. This may seem like an obvious pre-processing step, but it is unfortunately not always implemented.

      We thank the referee for pointing this potential issue. The data used in this work, as well as the bioinformatic workflow used to generate them has been described in detail in Pasqualini et al., 2024 [4]. As one of the main steps for preprocessing, we remove reads mapping to the human genome.

      (2) Line 13: “Bacterial“ excludes archaea, and while you may not have many high-abundance archaea in your human gut data, this sentence does not specify the human gut. Usually, this exclusion is averted via the term “microbial“, though sometimes researchers raise objections to the term when the data does not include fungal members (e.g., all 16S studies).

      We thank the reviewer for this suggestion. As to include archaeal organisms, we adopt the term “microbial“ instead of “bacterial“.

      (3) Line 18: This manuscript is being submitted under the “Physics of Living Systems“ tract, but it may be useful to explicitly state in the Abstract that disordered systems are a useful approach for understanding large, complex communities for the benefit of life science researchers coming from a biology background.

      Thank. We have modified the abstract following this suggestion.

      (4) Line 68: Consider using “adapted“ or something similar instead of “mutated“ if there is no specific reason for that word choice.

      We thank the reviewer for this suggestion, which was implemented in the text.

      (5) Line 111: It would be useful to define annealed and quenched for a general life science audience.

      We thank the reviewer for this suggestion. In the “Results” section, we have opted for “time-dependent disordered interactions” to reach a broader audience and avoid any jargon. Moreover, in the Discussion we added a detailed footnote: “In contrast to the quenched approximation, the annealed version assumes that the random couplings are not fixed but instead fluctuate over time, with their covariance governed by independent Ornstein–Uhlenbeck processes.”

      (6) Line 124: Likewise for the replicon sector.

      We thank the reviewer for the suggestion. We added a footnote on page 4, after the formula, to highlight the physical intuition behind the introduction of the replicon mode.

      “The replicon eigenvalue refers to a particular type of fluctuation around the saddle-point (mean-field) solution within the replica framework. When the Hessian matrix of the replicated free energy is diagonalized, fluctuations are divided into three sectors: longitudinal, anomalous, and replicon. The replicon mode is the most sensitive to criticality signaling – by its vanishing trend – the emergence of many nearly-degenerate states. It essentially describes how ‘soft’ the system is to microscopic rearrangements in configuration space.”

      (7) Figure 2: It would be helpful to include y-axis labels for each order parameter alongside the mathematical notation.

      We thank the reviewer for this suggestion. Now the y-axis of Figure 2 includes, along the mathmetical symbol, the label of the represented quantities.

      (8) Line 242: Subscript “U” is used to denote “Unhealthy” microbiomes, but “D” is used to denote “Diseased” in Figs. 2 and 3 (perhaps elsewhere as well).

      We thank the reviewer for this observation. After checking the various subscripts in the text, coherently with figure 2 and 3, we homogenized our notation, adopting the subscript “D“ for symbols related to the diseased/unhealthy condition.

      (9) Line 283: “not to“ should be “not due to“

      We thank the reviewer for this suggestion. After inspecting the text, we corrected the reported error.

      (10) Equations 23, 34: Extra “=“ on the RHS of the first line.

      We consistently follow the @same formatting across all the line breaks in the equations throughout the text.

      We are thus resubmitting our paper, hoping to have satisfactorily addressed all referees’ concerns.

      References

      (1) Jacopo Grilli. Macroecological laws describe variation and diversity in microbial communities. Nature communications, 11(1):4743, 2020.

      (2) Guy Bunin. Ecological communities with lotka-volterra dynamics. Physical Review E, 95(4):042414, 2017.

      (3) Matthieu Barbier, Jean-Franc¸ois Arnoldi, Guy Bunin, and Michel Loreau. Generic assembly patterns in complex ecological communities. Proceedings of the National Academy of Sciences, 115(9):2156–2161, 2018.

      (4) Jacopo Pasqualini, Sonia Facchin, Andrea Rinaldo, Amos Maritan, Edoardo Savarino, and Samir Suweis. Emergent ecological patterns and modelling of gut microbiomes in health and in disease. PLOS Computational Biology, 20(9):e1012482, 2024.

      (5) Ada Altieri, Felix Roy, Chiara Cammarota, and Giulio Biroli. Properties of equilibria and glassy phases of the random lotka-volterra model with demographic noise. Physical Review Letters, 126(25):258301, 2021.

      (6) Giulio Biroli, Guy Bunin, and Chiara Cammarota. Marginally stable equilibria in critical ecosystems. New Journal of Physics, 20(8):083051, 2018.

      (7) Amir Bashan, Travis E Gibson, Jonathan Friedman, Vincent J Carey, Scott T Weiss, Elizabeth L Hohmann, and Yang-Yu Liu. Universality of human microbial dynamics. Nature, 534(7606):259–262, 2016.

      (8) Marcello Seppi, Jacopo Pasqualini, Sonia Facchin, Edoardo Vincenzo Savarino, and Samir Suweis. Emergent functional organization of gut microbiomes in health and diseases. Biomolecules, 14(1):5, 2023.

      (9) Jared Kehe, Anthony Ortiz, Anthony Kulesa, Jeff Gore, Paul C Blainey, and Jonathan Friedman. Positive interactions are common among culturable bacteria. Science advances, 7(45):eabi7159, 2021.

      (10) Ophelia S Venturelli, Alex V Carr, Garth Fisher, Ryan H Hsu, Rebecca Lau, Benjamin P Bowen, Susan Hromada, Trent Northen, and Adam P Arkin. Deciphering microbial interactions in synthetic human gut microbiome communities. Molecular systems biology, 14(6):e8157, 2018.